Psychological Testing
Psychological Testing
Functions of measurement
1. Measurement has varied functions.
2. In selection: selection of personnel in industry or other institutions may be carried out with the
help of psychologists. The function is to predict the ability of an individual.
3. In classification: measurement helps the teacher to classify the children as low or high
achievers; retarded, average and gifted.
4. In comparison: the pioneering work of Galton and Darwin has revealed that no 2 individuals are
alike. Whenever 2 persons are to be compared on any of the factors, measurement comes into
use.
5. In guidance and counselling: measurement helps an individual know his strengths and
weaknesses and may provide insight and understanding into the relationship between the
counsellors and the patient. Predict problems of adjustment likely to come up in the future and
also diagnose mental disabilities, aberrations, deficiencies.
6. In research: measurement helps in research activities, undertaken to discover new facts about a
problem. The effect of one variable or a set of variables is studied while the effects of all other
variables are controlled.
7. In improving classroom instruction: it measures the outcome of instruction, becomes clear how
many students are being benefited by instruction and how many are not being benefited by it or
at least benefitted. After such an analysis some essential suggestions for modification in the
instruction can be made.
The unit of measurement is fixed and constant There is no fixed unit of measurement. For ex, some may
throughout the measurement measure intelligence in terms of verbal questions or items
answered in specified time
There is a true 0 point, a point which Arbitrary 0 point, a point which does not represent the
represents the underlying absence of the trait underlying absence of the trait being measured
being measured
Is more accurate and predictable Zero point is itself not known. No prediction can be made
with definite accuracy
The entire quantity can be measured The entire quantity cannot be measured but only a sample
representing that quantity or trait can be measured
Step 7: Standardization
● Establishing, validity, reliability and norms of the test
● Also establishing cross validation and co validation
● Cross validation: revalidation of a test on a sample for test takers other than those on whom test
performance was originally found to be a valid predictor of some criterion
● Co-validation and co-norming: test validation process conducted on 2 or more tests using the
sample of test takers. Highly beneficial process for the publisher, test user and the test takers.
Step 8: test manual and publication of the test
● Preparation of manual for the test which includes the psychometric properties of the test, norms
and references.
● Offer clear details regarding the procedures of the test, administration and, scoring methods and
time limits
● The manual also includes standardization details of the test
Item analysis
● After items written, reviewed and carefully edited
● Uses statistics and expert judgements to evaluate tests based on the quality of individual items,
item sets and entire set of items as well as the relationship of each item to the other items
● Investigates the performance of items considered individually either in relation to some external
criterion or in relation to the remaining items on the test.
● Validity is tested on a group of examinees where their performance on individual item is
compared to their performance on the whole test
● Item analysis provides us with the estimate of validity of each item.
● Concepts are similar for norm referenced and criterion referenced but differ in specific and
significant ways.
● It gives 2 kinds of info, idea about difficulty index of the item and index of validity
● Discriminative power of the statement.
Item difficulty
● Percentage of people who answer an item correctly.
● Relative frequency with which the examinees choose the correct response.
● Ranging from 0 to as high of +1
● Higher difficulty indexes indicate easier items. Item answered correctly by 75% of examinees
has item difficulty of 0.75
Item discrimination
● indicated how adequately an item separates or discriminates between high scorers and low
scorers on an entire test.
● Also known as validity index
● Marshall and Hales in 1972 said that discriminatory power or validity indicates the extent to
which success and failure on that item indicate the possession of the trait of achievement being
measured
Divided into 2
● Positively discriminatory item: item in which the proportion or percentage of corrected answers
is higher in the upper group
● Negatively discriminatory power: item in which the proportion or percentage of correct answers
is lower in the upper group.
● Nondiscriminatory power : item in which the proportion or percentage of corrected answers is
equal or approx equal in both the upper and lower group.
● Number of high scorers and low scorers who answer an item correctly
● The higher the discrimination index, the better the item because high values indicate that the
item discriminates in favour of the upper group which should answer more items correctly. If
more low scorers answer the item correctly, it will have a negative value and is probably flawed.
● Good items have a discrimination index of 0.40 or higher, reasonably good items 0.30-0.39,
marginal items from 0.20-0.29 and poor items less than 0.20.
● 2 ways of determination of index of discrimination
● By applying a test of significance of the difference between 2 proportions or percentages:
arrange total scores from highest to lowest and then top 27% in upper and lower groups are
selected. It is the kelly’s 27% extreme group method
● By applying the correlational techniques: it is the correlation of each item with the combination
(sum or avg) of all the remaining items, not counting that 1. The larger the item remainder, the
more the item in question relates to the remaining items
● Kingston and Kramer recommended an item remainder coefficient greater than 0.30 for the
selection/retention criterion of an item total correlation
● The item total correlation is a correlation between the question score and the overall assessment
score
● Many statistics used to determine whether test item is valid and reliable are point biserial and
p-value
● Point biserial correlation: The point-biserial correlation coefficient is a statistical measure used
to assess the relationship between a continuous variable (typically a test score) and a
dichotomous variable (usually representing correct or incorrect responses on a test item).
● P-value: provides proportion of students that got the item correct and is a proxy for item
difficulty or more precisely item easiness
● Static ranges from 0 to 1
● Problematic items may show high p-values but should not be taken as a sign of item quality
● Item characteristics curve is a graphical representation of the probability of giving the correct
answer to an item as a function of the level of underlying characteristics of the examine
(ability) assessed by the test
● Basic building block of item response theory: bounded between 0 to 1
● Monotonically increasing
● Commonly assumed to take the shape of a logistic function
● Each item has its own ICC
● Illustrate discrimination power and item difficulty
● Steepness or slope conveys info about discriminatory power of the item
● Position of the curve gives indication about the difficulty of each item
● For difficulty items, ICC starts to rise on the right hand side of the plot
● For easier items, ICC curve starts to rise on the left hand side of the plot
● Analyzing the distractors (incorrect alternatives) is useful in determining the relative usefulness
of them in each item. Items should be modified if students consistently fail to select certain
multiple choice alternatives.
● The alternatives are totally implausible and of little use of decoys in multiple choice items
● A discrimination index or discrimination coefficient should be obtained for each option in order
to determine each distractor’s usefulness whereas the discrimination value of the correct answer
should be positive, the discrimination values for the distractors should be lower and preferably
negative
Test standardization
Reliability
● Describes how replicable or repeatable a study is, while validity describes how accurately the
study measures what it intends to measure
● A reliability coefficient is an index of reliability, a proportion that indicates the ratio between
the true score variance on a test and the total variance
● Reliability is a property of scores
● Based on CCT, any obtained score is divided between error and true score
● The total variance of the test is also divided into components: true variance and error variance
Types of reliability
1. External Reliability over time and consistency across raters is known as external reliability.
Refers to the extent to which a measure varies from 1 use to another.
Test retest Estimate of reliability abstained by correlating pairs of scores from the same
people on 2 different administrations of the same test. Apt measure when we
wish to measure something that is stable over time. If any characteristic
fluctuates then it is not a meaningful measure. As the time interval between the
scores obtained on each testing increases, the correlation between the 2 scores
decreases. When the time interval between testing is greater than 6 months, the
estimate of test-retest reliability is often referred to as coefficient of stability.
Diadv, result take long time to be obtained
parallel Measures 2 diff tests designed in the same way. Outcome of the results is
correlated through statistical measures to determine the reliability. The source of
error variance is content sampling, time sampling and content heterogeneity.
When administered immediately called parallel (immediately) reliability and
when after a gap, then parallel form (delayed) reliability. Correlation between 2
parallels is the estimate of reliability. Pearson’s r. Degree of a relationship is
called the coefficient of equivalence. The same test is not repeated and memory,
practice and carryover effects and recall factors are minimized and do not affect
the scores. It is difficult to have 2 parallel forms of a test especially in situations
like Rorschach. Test scores of 2nd form of the test are generally high
Inter rater Inter-rater reliability is the extent to which the observations made by different
observers are consistent.
2. Internal Consistency across items assess the consistency of results across items within a
test
Split half Is an improvement over the earlier 2 methods and it involves both the
reliability characteristics of stability and equivalence. This method provides the internal
consistency of test scores. All of the items of the test are generally arranged in
increasing order of difficulty and administered once on sample. After
administering the test it is divided into 2 comparable or similar or equal halves or
parts. Spearman brown prophecy formula is used. Calculated using
Rulon-Guttman’s formula, the variance of the differences between each person’s
scores on the 2 half tests and the variance of total scores are considered. The
Flanagan Formula is very close to Rulon’s formula. In this formula, the variance
of 2 halves are added instead of the difference between the 2 halves. Cannot be
estimating reliability of speed tests. The chance errors may affect the scores on 2
halves in the same way and thus tend to make reliability coefficient too high.
Cannot be used in power tests and heterogenous tests.
Cronbach’s alpha
1. Commonly used as a measure of internal consistency or reliability
2. Developed by Lee Cronbach in 1951
3. As an extension of Kuder Richardson formula (KR20)
4. Method uses the variance of scores of odd, even and total items to work out the reliability.
5. It is the avg of all possible split half coefficients varies from 0 to 1, and a value of 0.6 or less
generally indicates unsatisfactory internal consistency reliability.
6. Its value tends to increase with an increase in the number of scale items.
7. Coefficient alpha may be artificially and inaptly.
8. Another is coefficient beta.
9. Assists in determining whether the averaging process used in calculating coefficient alpha is
masking any consistent items.
Length of the test The more the number of items the test contains, the greater will be its
reliability and vice versa. The more samples of items we take of a given
area of knowledge, skill and the more reliable the test will be.
Homogeneity of items Has 2 aspects, item reliability and the homogeneity of traits measured from
1 item to another. If the items measure different functions and the inter
correlations of items are zero or near to it then reliability is 0 or very low
and vice versa.
Difficulty value of items The difficulty level and the clarity of expression of a test item also affect the
reliability of test scores. If the test items are too easy or too difficult for the
group members it will tend to produce scores of low reliability.
Discriminative value When items can discriminate well between superior and inferior, the item
total correlation is high, the reliability is also likely to be high and vice
versa.
Item selection If there are too many interdependent items in a test, the reliability is found
to be low.
Reliability of scorer If he is a moody, fluctuating type, the scores will vary from 1 situation to
another.
Group variability When the group of pupils being tested is homogenous in ability, the
reliability of test scores is likely to be low and vice versa
Guessing and chance Guessing gives rise to increased error variance and as such reduces
errors reliability.
Monetary fluctuations May raise or lower reliability. Broken pencil, anxiety regarding
non-completion of home work and knowing no way to change it are the
factors which may affect the reliability
Poor instructions Test takers may not be able to provide the accurate answers if they don’t
understand how to take the test or if they don’t understand some of the
specific test questions
Errors in reliability
There is always a chance of 5% error in reliability which is acceptable
Types of errors
1. Random error: exists in every measurement and is often a major source of uncertainty. Have no
particular assignable cause. Can never be totally eliminated or corrected. Caused by
uncontrollable variables and are an inevitable part of every analysis made by human beings.
Even if we identify some they cannot be measured because most often they are so small
2. Systematic error: caused due to instruments, machines and measuring tools. It is not due to
individuals.
Validity
1. Extent to which the test measures what it is supposed to measure.
2. 4 categories
3. 1966, association combined predictive and concurrent validity into a single grouping called
criterion validity.
Types of validity
Internal validity Looks to see if there are any methodology issues and if the study is structurally
sound. Look at the structure of the study or test. If the test is internally valid, it has
a clear cause and effect relationship and there is no chance of any alternative
explanation. The researchers must remove any error from the trials they conduct
and anything that could contaminate the findings. May use techniques to ensure
validity, blinding, random selection, experimental manipulation and study protocol.
External validity Extent to which test reflects the truth of a population which means a test is valid
will apply to the real world. Relies on a test’s ability to be replicated when it comes
to diff settings, people and even periods. If a test only works in one setting or with
specific groups of people, it cannot be deemed externally valid. To ensure, calibrate
the test as needed. Replicate the test in various env with various participants and
ensure participants act as closely to their normal behaviors during testing
1. Face validity Extent to which the test seems relevant, imp and interesting. Least rigorous
measure of validity. Pertains to the fact whether the test looks valid or not.
Methods to measure it can be poll participants, follow up questionnaire etc
2. content Degree to which a test matches a curriculum and accurately measures the specific
training objectives on which a test program is based. It uses expert judgement of
qualified experts to determine if a test is accurate, appropriate and fair
3. Criterion Measures how well a test compares with an external criterion. If reflects whether a
related scale performs as expected in relation to other selected variables (criterion
variables) as meaningful criteria.
1. Validity is a relative term since it is valid only for a particular purpose. If used elsewhere
becomes invalid. Validity is not a fixed property of the test because validation is an unending
process that needs to be revised with the discovery of new concepts and the new meanings.
2. Validity is a matter of degree and not all or none property
3. Validity is a unitary concepts and not available in various types but in various aspects such as
content related, criterion related or construct related
Norms
1. Is an average or typical score on a particular test obtained by a set/group of defined individuals
2. Based on the distribution of scores obtained by the people of the standardization group.
3. The sample should be large enough to provide stable values
4. Sample must be representative of the population under consideration
5. Scores on psychological tests are most commonly interpreted by reference to norms which
represent the test performance of the standardization sample
6. Indicate an individual’s relative position in the normative sample
7. Provide comparable measures which permit direct comparison of performance on different
tests.
8. Norm referencing is when the raw score is compared with the score of a specific group of
examiners on the same test. Therefore each examinee is compared with a norm.
9. In order to compare the raw score with the performance of the standardised sample, they are
converted into derived scores.
10. Percentile norms are also reported through ogive, a graph that show a cumulative percentage of
scores falling below the upper limit if each class interval
11. Used for determining how many of a set of observations are less than or equal to a specific
value.
Percentile Percentile rank is a type of converted score that expresses an individual’s score relative
norms to their group in percentile points. If the percentile norm is to be made meaningful, it
should be a sample which is made homogeneous with respect to gender, age and other
factors. Easy to calculate, understand, interpret, no assumption about the characteristics
of populations
decile Points which divide the scale of measurement into 10 equal parts. Range of deciles is
from decile 1 - decile 9. Decile score 1 indicates that 10 % of cases lie below you or the
lowest 10% of the groups.
Standard Set of scores with the same mean and standard deviation. By converting raw scores into
score a standard score, it allows the scores to be completed. Look how many people scored
above.
percentiles Standard scores can also be used to illustrate how well someone did in comparison to
others. The relative standing of the person’s standard score in comparison to others..
Used to determine how many people scored between 2 scores. Have an equal unit of
measure. Can be converted into standard score by 2 methods: linear and normative
transformation.
For linear transformation, all characteristics of original distribution of raw score are
retained without any change in distribution. For normative transformation, raw scores
are skewed distributions are adjusted to produce a normal frequency distribution and
converted to a standard base.
Qualitative Developmental norms are developed for psychological constructs which develop. They
are supplemented by percentiles or standard scores. Have appeal for descriptive purposes
and for certain research purpose, agr, grade and gender norms
Age norms Relates the level of performance to the age of the people taking the test. Median score on
a test obtained by persons of a given chronological age.
Grade norms Median score on a test obtained by students of a given grade level. Most popular when
reporting achievement levels of school children, and are useful for teachers to
understand as to how well the students are progressing at a grade level.
nominal Involves variables that represent some type of non-numerical or qualitative data. Values are
grouped into categories that have no meaningful order. Normal variables are also known as
categorical data. Typical descriptive statistics associated with nominal data are frequencies and
percentages. Chi square test. Cannot be used to perform many statistical computations such as
mean and standard deviation.
ordinal Represents some type of relative ranking or an order of categories. Rank order data. They are
nominal level variables with meaningful order. Can also be categorical if the categories are
measured in a specific order. Described with frequencies and percentages. The Mann-Whitney U
test is most apt for ordinal level dependent variable and a nominal level independent variable.
People often use surveys that involve Likert scale. Means, SD and parametric statistical tests are
not apt to use with ordinal data
interval Represents variables in which numerical values represent equal amounts of whatever is being
measured. Also referred to as equal interval data and involves data that is grouped in evenly
distributed values. Temp. can be used to compute commonly used statistical measures such as
average, SD and pearson correlation coefficient
ratio Type of equal-interval measurement, has absolute zero point. Value of zero indicates a complete
absence of what is being measured. A lot of things are measured using ratios. All arithmetic
options are possible on a ratio variable. ANOVA is most apt for continuous level dependent
variable and a nominal level independent variabe. Used as a dependent variable for most
parametric statistical tests such as t, f, regression
Scaling techniques
1. In scaling, the objects are text statements usually statements of attitudes, opinion or feeling
2. Involves creating a continuum upon which measured objects are located
3. Can be classified into 2 categories
Paired comparison Respondents are presented with 2 objects at a time and asked to select 1 object
scale according to some criterion. Data is ordinal in nature. Useful when the number of
brands are limited since it requires direct comparison and overt choice. Scale has little
resemblance to the market situation which involves selection from multiple
alternatives comparative scaling technique in which a respondent is presented with 2
objects at a time and asked to select one criteria. Ordinal nature
Rank order scale Respondents are presented with several items simultaneously and asked to rank them
in order of priority. It describes the favoured and unfavoured objects but does not
reveal the distance between the objects. More realistic in obtaining the responses and
yields better results when direct comparison is required and limitation is that only
ordinal data can be generated
Constant sum The respondents were asked to allocate a constant sum of units such as rupees, points
among a set of stimulus objects with respect to some criterion. Respondents might be
asked to divide a constant sum to indicate the relative importance of the attributes
using the following format.
Q-sort scale Uses rank order procedure to sort objects based on similarity with respect to some
criterion. More imp to make comparisons among different responses of a respondent
than the responses between different respondents.
2. Non Respondents need only evaluate 1 single object. Using a non-comparative scale
comparative employ whatever rating standard seems apt to them
Continuous Simple and highly useful. The respondent’s rate the objects by placing a mark at the
rating apt position on a continuous line that runs from one extreme of a criterion variable to
the other. Respondents score is determined either by dividing the line into as many
categories as desired and assigning the respondent a score based on the category into
which his/her marks fall or by measuring distance from either end of the scale
Itemised rating Scale having numbers or brief descriptions associated with each category. The
categories are ordered in terms of scale position and the respondents are required to
select 1 of the limited number of categories that best describes the product, brand,
company or product attribute being rated. 3 itemised scales. Likert scale, developed
by Rensis Likert is popular for measuring attitudes because the method is simple to
administer. Indicate their own attitude by checking how strongly they agree or
disagree with carefully worded statements that range from very positive to very
negative towards the attitudinal object. Semantic differential scale is a 7 point rating
scale with endpoints associated with bipolar labels (such as good and bad) that have
semantic meaning. Used to find whether a respondent has a positive or negative
attitude towards an object. Been used to develop advertising and promotion
strategies.staple scale only extremes have names which represent the bipolar
adjectives with the central category representing the neural position. The in between
categories have blank spaces. A weight is assigned to each position on the scale.
Intelligence tests
1. Series of questionnaires and exercises that are designed to measure the intelligence of an
individual.
2. Can be conducted on children as well as adults
3. First intelligence test published in 1905. Late in the 19th century, alfred Binet founded the 1st
experimental psychology research laboratory in France
4. In his lab, Binet attempted to develop experimental techniques to measure intelligence and
reasoning abilities.
5. Was successful in measuring intelligence and in 1905, he and Theodore Simon published the
1st test of mental ability, Binet Simon scale. Assessment was intended for children between the
ages 3-13 yers. Consisted of an arrangement of 56 questions and tasks that were designed to
distinguish between children with cognitive deficits and those who were lower performing due
to lack of motivation or laziness.
6. Measures intelligence without regard for their educational experience.
7. Originally created in 1904 and has since undergone various revisions
8. In 1916, Lewis Terman produced the stanford-binet intelligence scales, an adaptation of Binet’s
original test. Used with Americans of ages 3-adulthood.
9. Consists of assessment in the areas of fluid reasoning, knowledge, quantitative reasoning,
visual-spatial processing and working memory.
10. Contains 10 subsets that can take approx ten 10 min each and is used for ages 2-85 years
11. The test is composed of mostly timed proportions which some researchers recognize as
problematic. At the conclusion, a single score is obtained
12. in 1939, wechsler-bellevue intelligence scale (WBIS) became popular.
13. Indexed the general mental ability of adults and revealed a pattern of a person’s intellectual
strengths and weaknesses.
14. WBIS was developed by david wechsler in bellevue hospital based on an individual’s ability to
act purposefully, think logically, and interact/cope successfully with the environment.
15. 2nd edition in 1946.
16. 1951, revised 2nd and named it as WAIS
17. In 1981 and 1991, WAIS was updated as WAIS-R and WAIS-III
18. To enhance clinical utility and user friendliness of the test, 4th edition was published in 2008
19. RSPM in 1938, assesses abstract reasoning through a nonverbal test. Includes 3 assessments
that can be given to children as early as age 5 all the way through elderly people.
20. Eliminates cultural bias
21. Progressive test, with increasing complexity.
22. 1938 original form was entirely black and white
23. 2nd test is in color which is said to be more visually stimulating and is intended for people with
cognitive disabilities or advanced age
24. The 3rd test contains more tasks and is intended for people in the age of adolescence through
adulthood who are believed to have advanced cognition.
Performance tests
1. Require the test takers to perform a particular well defined task such as making a right hand
turn, arranging blocks
2. Test takers try to do their best because their scores are determined by their success in
completing the task
3. Driving test, test of specific abilities and classroom tests
Aptitude tests
1. Achievement tests measure a test taker’s knowledge in a specific area at a specific point in time.
Aptitude test assesses a test taker’s potential for learning or ability to perform in a new job or
situation.
2. Measure the product of cumulative life experiences or what one has acquired over time.
Personality tests
1. Measure human character or disposition
2. Early 1900 brought an interest in measuring personality of individuals
3. The personal data sheet: during world war 1, the US military wanted a test to help detect
soldiers who would not be able to handle the stress associated with combat.
4. APA commissioned Robert Woodsworth to design such a test which came to be known as PDS.
pen and paper psychiatric interview to respond yes or no, 200 ques, reduced to 116.after world
war 1, woodsworth developed the woodworth psychoneurotic inventory, used for civilians and
was 1st self report test.
Project personality
1. Exploring unconscious
2. Ambiguous images or given situations and asked to interpret them
3. Subjects are to project their own emotions, attitudes and impulses onto the stimulus given and
use these projections to explain an image, tell a story or finish a sentence.
4. 1920 and 1930s
5. Their popularity and use has risen and fallen at various times
6. The way a person interprets the stimuli tells a lot about them as a person while opponents argue
that tests lack reliability and validity
7. Rorschach : 1st projective test, Hermann in 1921. Useful in depression, anxiety and psychosis
8. TAT: Henry A.Murray and C.D Morgan . Both tests are based on theories of Jung. 8-12
pictures.
9. Draw a person: size, features and any added details of the person
10. House tree person: series of ques about their drawings such as who lives in this house or how
the person is feeling. John Buck and included 60 ques
11. California psychological inventory: assess 20 attributes of normal personality. Structured
personality inventory. Dominance, independence, well being. 480 item test published by
Harrison Gough in 1968. Scale was revised in 1987 and reduced to 462 items. 6000 m and
7000f have norms based. 3rd revision reduced by 28 items. 3000m and 3000f. Interpersonal
behaviour and social interaction
12. Half of items of original version taken directly from MMPI
13. Criticism: some criterion groups were used in establishing the scales identified by their friends
as being high or low on the trait
14. 4 of the scales ( social presence, self acceptance, self control and flexibility) were developed by
selecting items that theoretically were designed to measure the construct.
NEO
1. Costa and McCrae in 1985 and 1992
2. 2. 5 primary dimensions of personality called big five
3. Normal adults ranging from 20-80
4. 6 Facets underlie each of the major constructs
5. 240 items (30 for each facet), with 3 additional validity, check items
6. Takes about 30 min to complete
7. N indicates the degree to which a person is anxious and insecure. 6 facets are anxiety, hostility,
depression, self-consciousness, impulsiveness and vulnerability.
8. E indicates the degree to which a person is imaginative and curious versus concrete and narrow
minded, 6 facets are fantasy, aesthetics, feeling, action, ideas and values
9. A indicates degree of warm and cooperative versus unpleasant and disagreeable, 6 facets are
trust, modesty, compliance, altruism, straightforwardness and tender mindedness. C indicates
the extent to which the person is persevering and responsible, 6 facets are competence, self
discipline, achievement striving, dutifulness, order and deliberation.
10. A short version of 60 items assesses only the 5 major constructs and takes 15 min to complete
11. Forms allow for self report and observe report
12. Response sheets can be hand or machine scored
Structured Interviews
1. Clinicians frequently interview clients, as a part of the assessment process, to gather
information they will use to help diagnose problems and plan treatment programs.
2. The typical clinical interview does not ordinarily qualify as a psychological test. The clinical
interview is not intended to measure samples of behavior in order to make inferences.
3. The clinician merely asks unstructured, yet purposeful questions to gather information.
4. Some semi-structured interviews, such as the Structured Clinical Interview for DSM-IV Axis 1
Disorders covers a wide range of mental health concerns.
5. Our semi-structured interviews are concerned with a single diagnosis, such as the Brown
Attention-Deficit Disorder Scales or the Yale-Brown Obsessive Compulsive Scale.
6. Acknowledged as the gold-standard measure of obsess compulsive disorder (OCD) a mental
disorder characterized by repetitive, intrusive ideas or behavior-the Y-BOCS measures presence
and severity of symptoms.
● Demand characteristics are the features of a study, which gives cues on how someone is meant
to behave
● Reactivity refers people’s behavior is affected by the knowledge that they are being observed
● A measure that is capable of differentiating between 1 group of participants from another group
on a particular construct may have good discriminant validity
● Personality orientation inventory: Maslow
● Q-sort: rogers
● House tree person test: Buck
● Objective analytic tests battery : Cattell