0% found this document useful (0 votes)
390 views48 pages

Assignment Presentation Report

PResentation report

Uploaded by

Schahyda Arley
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
390 views48 pages

Assignment Presentation Report

PResentation report

Uploaded by

Schahyda Arley
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

INDIVIDUAL ASSIGNMENT (PRESENTATION REPORT) Presentation 1: Building a Test (writing and evaluating test items)/ Selection and decision analysis/Test administration Introduction Writing test items is a matter of precision, perhaps more akin to computer programming than to writing prose. A test item must focus the attention of the examinee on the principle or construct upon which the item is based. Ideally, students who answer a test item incorrectly will do so because their mastery of the principle or construct in focus was inadequate or incomplete. Any characteristics of a test item which distract the examinee from the major point or focus of the item, reduces the effectiveness of that item. Any item answered correctly or incorrectly because of extraneous factors in the item, results in misleading feedback to both examinee and examiner. A poet or writer, especially of fiction, relies on rich mental imagery on the part of the reader to produce an impact. For item writers, however, the task is to focus the attention of a group of students, often with widely varying background experiences, on a single idea. Such communication requires extreme care in choice of words and it may be necessary to try the items out before problems can be identified.

Essential Characteristics of Item Writers Given a task of precision communication, there are several attributes or mind sets that are characteristics of a proficient item writer. Knowledge and Understanding of the Material Being Tested At the University level, the depth and complexity of the material on which students are tested necessitates that only faculty members fully trained in a particular discipline can write concise, unambiguous test items in that discipline. Further, the number of persons who can meaningfully critique test items, in terms of the principles or constructs involved, is limited. An agreement by colleagues to review each others tests will likely improve the quality of items considerably prior to the first try-out with students. Continuous Awareness of Objectives A test must reflect the purposes of the instruction it is intended to assess. This quality of a test, referred to as content validity, is assured by specifying the nature and/or number of

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


items prior to selecting and writing the items. Instructors sometimes develop a chart or test blueprint to help guide the selection of items. Such a chart may consider the modules or blocks of content as well as the nature of the skills a test is expected to assess. In the case of criterion-referenced instruction, content validity is obtained by selecting a sample of criteria to be assessed. For content-oriented instruction, a balance may be achieved by selecting items in proportion to the amount of instructional time allotted to various blocks of material. An example of a test blueprint for a test with thirty-eight items is shown below. Test Blueprint Types of Tests Knowledge of terms Reliability Validity Correlation Total 3 1 4 4 4 1 14 1 4 6 2 1 14 5 11 12 7 3 38

Comprehension of principles 3 Application of principles Analysis of situations Evaluation of solutions Total 2 1 1 10

The blueprint specifies the number of items to be constructed for each cell of the twoway chart. For example, in the above test blueprint, two items are to involve the application of the principles of reliability. Continuous Awareness of Instructional Model Different instructional models require items of quite different characteristics for adequate assessment. For example, appropriate item difficulty in a mastery-model situation might be a maximum value of 20 (twenty-percent of the students answering incorrectly). On the other hand, items written for a normative model might have an appropriate average difficulty of the order of 30 to 40.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Ideally, item discrimination (the degree to which an item differentiates between students with high test scores and students with low test scores) should be minimal in a mastery-model situation. We would like to have all students obtain high scores. In the normative-model, item discrimination should be as high as possible in order that the total test differentiates among students to the maximum degree. Understanding of the Students for Whom the Items are intended Item difficulty and discrimination are determined as much by the level of ability and range of ability of the examinees as they are by the characteristics of the items. Normative-model items must be written so that they provide the maximum intellectual challenge without posing a psychological barrier to student learning through excessive difficulty. In either the normative or mastery models, item difficulty must not be so low as to provide no challenge whatever to any examinee in a class. It is generally easier to adjust the difficulty than to adjust the discrimination of an item. Item discrimination depends to a degree on the range of examinee ability as well as on the difficulty of the item. It can be difficult to write mastery-model items which do not discriminate when the range of abilities among examinees is wide. Likewise, homogeneous abilities make it more difficult to write normative-model items with acceptably high discriminations. No matter what the instructional model or the range of abilities in a class, the only way to identify appropriate items is to select them on the basis of subjective judgment, administer them, and analyze the results. Then only items of appropriate difficulty and discrimination may be retained for future use. Skill in Written Communication An item writer's goal is to be clear and concise. The level of reading difficulty of the items must be appropriate for the examinees. Wording must not be more complicated than that used in instruction. Skill in Techniques of Item Writing There are many helpful hints and lists of pitfalls to avoid which may be helpful to the item writer. This is an area where measurement specialists may be particularly helpful. The remainder of this hand-out will be devoted to item-writing tips. Guideline for Item Writing Define clearly what you want to measure

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Use substantive theory as a guide As specific as possible

Generate an item pool Theoretically: all items are randomly choose from universe of item content Practice: - selecting and developing items - avoid redundant items

Avoid exceptionally long items Confusing or misleading

Keep the level of reading difficulty appropriate for those who will complete the scale Avoid double-barreled items that convey two or more idea at the same time I vote Democratic because I support social programs I vote Democratic / I support social programs

Consider mixing positively and negatively worded items Acquiescence response set: the respondents will tend to agree with most items Avoid bias: include items that are worded in opposite direction (I felt hopeful about the future-asking about depression CES-D)

When constructing a multiple response item, consider the following: Plan to have two correct answers out of five choices or three correct out of five or six choices. Always remove distracters that are not being selected by examinees. Do not use Choose all that apply instead identify the number of choices that are needed to supply a complete correct responses. It is important to provide information to examinees as to the number of correct choices as a matter of fairness. Identify the number of correct options. Use the phrase Choose XXX in the item stem, and present it in parentheses preceded by spaces. For example: (Choose TWO) or (Choose THREE). Score test items so that selecting only the correct options count as a correct response. Do not give partial credit or accept examinee selection of two correct options, but also selected a third incorrect option. Score the item as incorrect.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

Item Format Test item development is a critical step in building a test that properly meets certain standards. A good test is only as good as the quality of the test items. If the individual test items are not appropriate and do not perform well, how can the test scores be meaningful? Therefore, test items must be developed to precisely measure the objectives prescribed by the blueprint and meet quality standards.

Test items are the building blocks of an exam. All test items are composed of three parts: item stem (question), correct answer(s), and distracters (incorrect responses). The term options refers generally to all the choices that are available. The person responding to the question can either select or construct the appropriate response to answer the question depending on the item format presentation. We will only discuss selected response test items in this document. Constructed response questions require different forms of scoring rubrics but still follow similar guidelines in how questions are formulated.

Test Item Formats There are many test item formats that can be used in a computer-based examination. The most popular formats include, but are not limited to, essay, short answers, multiple-choice, multiple-response, matching, and simulations. These item formats can be scored either objectively by computers or subjectively through judgments by human evaluators. The following discussion solely relates to objectively scored test items where examinees select a response(s) to a question and it is scored by a computer program.

The multiple choice and multiple-response formatted test items are the most popular since they can be scored easily and reliably by machines as compared to examinee constructed responses. Also, these item formats are relatively easy to write and place in published examinations for either paper exams or computer-administered exams. Readers should not be surprised by what the most popular question formats are since they have probably seen these formats throughout their education. The multiple-choice format is sometimes criticized since some laypeople consider it a poor way to evaluate a persons

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


knowledge and skills. We often hear this format referred to as multiple guess. Yet, research has shown that these items do perform well when they are well constructed.

TRUE-FALSE QUESTIONS This test item format is presented first since it is the simplest selected response format. The true-false question is a statement of fact and gives the option to select True or False as an answer choice.

This true-false item format, if used, should be used infrequently (< 10%). This item format can work well when constructed properly. However, more often than not, this item format fails to perform well statistically. Donath has come to recognize the failure of this item format and always recommends not to use them at all in developing test questions. Time is much better spent on developing multiple choice or multiple response format questions. Most true-false questions can be rewritten into a multiple choice question.

MULTIPLE-CHOICE QUESTIONS Multiple-choice format is made up of an item stem, an answer, and 3 or more distracters when possible. Two distracter questions can work if it is difficult to write another distracter. There is only one correct answer for this format and can be written so that it measures not only knowledge of facts but can be used to evaluate high order thinking that requires problem-solving or critical thinking.

USE OF GRAPHICS/EXHIBITS Hot Area Graphics or Exhibits can be used within this format as well as with multipleresponse formats. The format requires an examinee to choose or identify where a specific location is on a picture (graphic) by clicking on it. The hot spot graphic has to have areas identified as incorrect choices and an area that is correct.

MULTIPLE-RESPONSE QUESTIONS

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


This item format has an item stem and more than one correct answer. Essentially, this is a combination of two or three multiple choice items in one. It is generally more difficult to answer and also discriminates very well between those who are proficient and those who are not proficient with the subject area being tested.

When constructing a multiple response item, consider the following:

Plan to have two correct answers out of five choices or three correct out of five or six choices. Always remove distracters that are not being selected by examinees. Do not use Choose all that apply instead identify the number of choices that are needed to supply a complete correct response. It is important to provide information to examinees as to the number of correct choices as a matter of fairness. Identify the number of correct options. Use the phrase Choose XXX in the item stem, and present it in parentheses preceded by spaces. For example: (Choose TWO) or (Choose THREE). Score test items so that selecting only the correct options count as a correct response. Do not give partial credit or accept examinee selection of two correct options, but also selected a third incorrect option. Score the item as incorrect.

Item Analysis Introduction: It is widely believed that Assessment drives the curriculum. Hence it can be argued that if the quality of teaching, training, and learning is to be upgraded, assessment is the obvious starting point. However, upgrading assessment is continuous process. The cycle of planning, and constructing assessment tools, followed by testing, validating, and reviewing has to be repeated continuously. When tests are developed for instructional purposes, to assess the effects of educational programs, or for educational research purposes, it is very important to conduct item and test analyses. These analyses evaluate the quality of the items and of the test as a whole. Such analyses can also be employed to revise and improve both items and the test as a whole.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Quantitative item analysis is a technique that enables us to assess the quality or utility of an item. It does so by identifying distractors or response options that are underperforming. Item-analysis procedures are intended to maximize test reliability. Because maximization of test reliability is accomplished by determining the relationship between individual items and the test a whole, it is important to insure that the overall test is measuring what it is supposed to measure. It this not the case, the total score will be a poor criterion for evaluating each item. The use of a multiple-choice format for hour exams at many institutions leads to a deluge of statistical data, which are often neglected or completely ignored. This paper will introduce some of the terms encountered in the analysis of test results, so that these data may become more meaningful and therefore more useful

Need for Item Analysis 1) Provision of information about how the quality of test items compare. The comparison is necessary if subsequent tests of the same material are going to be better. 2) Provision of diagnostic information about the types of items that students most often get incorrect. This information can be used as a basis for making instructional decision. 3) 4) Provision of a rational basis for discussing test results with students. Communication to the test developer which items needs to be improved or eliminated, to be replaced with better items.

What is the Output of Item Analysis? Item analysis could yield the following outputs: Distribution of responses for each distractor of each item or frequencies of responses (histogram). Difficulty index for each item of the test Discrimination Index for each item of the test.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Measure of exam internal consistency reliability

Total Score Frequencies Issues to consider when interpreting the distribution of students total scores: Distribution: Is this the distribution you expected? Was the test easier, more difficult than you anticipated? How does the mean score of this years class compare to scores from previous classes? Is there a ceiling effect that is, are all scores close to the top? Is there a floor effect that is, are all scores close to the lower possible?

Spread of Scores: Is the spread of scores large? Are there students who are scoring low marks compared to the majority of the students? Can you determine why they are not doing as well as most other students? Can you provide any extra assistance? Is there a group of students who are well ahead of the other students?

Difficulty Index It actually tells us how easy the item was for the students in that particular group. The higher the difficulty index the easier the question; the lower the difficulty index, the more difficult the question. The difficulty index, in fact, equals to Easiness Index.

Issues to consider in relation to the Difficulty Index

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Are the easiest items in your test, i.e. those with the lowest difficulty ranking, the first items in the test? If the more difficult items occur at the start of the test, students can become upset because they feel, early on, that they cannot succeed. Literature quotes following (generalized) interpretation of Difficulty Index.

Indexed Range 0.85 1.00 0.70 0.84 0.30 0.69 0.15 0.29 0.00 0.14

Inference to Question Very Easy Easy Optimum Hard Very Hard

Item Discrimination Index (DI) This is calculated by subtracting the proportion of students correct in the lower group from the proportion correct in the upper group. It is assumed that persons in the top third on total scores should have a greater proportion with the item correct than the lower third. The calculation of the index is an approximation of a correlation between the scores on an item and the total score. Therefore, the DI is a measure of how successfully an item discriminates between students of different abilities on the test as a whole. Any item which did not discriminates between the lower and upper group of students would have a DI=0. An item where the lower group performed better than the upper group would have a negative DI. The discrimination index is affected by the difficulty of an item, because by definition, if an item is very easy everyone tends to get it right and it does not discriminate. Likewise, if it is very difficult everyone tends to get it wrong. Such items can be important to have in a test because they help define the range of difficulty of concepts assessed. Items should not be discarded just because they do not discriminate.

Issues to consider in relation to the Item Discrimination Index

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Are there any items with a negative discrimination index (DI)? That is, terms where students in the lower third of the group did better than students in the upper third of the group? Was this a deceptively easy item? Was the correct answer key used? Are there any items that do not discriminate between the students i.e. where the DI is 0.00 or very close to 0.0? Are these items which are either very hard or very easy and therefore where you could have a DI of 0?

Literature quotes following (generalized) interpretation of Discrimination Index. Indexed Range Below 0.19 0.20 0.29 0.30 1.00 Inference to Question Poor Dubious Okay

SELECTION AND DECISION ANALYSIS The Test Manual 1. Proprietary Owned by test developer or publishing company Protected by copyright law User must pay to use it Eg Minnesota Multiple Personality Inventory and Standford-Binet Intelligence

2. Non-proprietary Mot protected by copyright law Distributed by test developer or publish in journals User pay no royalty

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Eg : Centre for Epidemiologic Studies Depression Need to consult the test manual to determine whether a given test is suited for your purpose

Base Rate In probability and statistics, base rate generally refers to the (base) class probabilities unconditioned on featural evidence, frequently also known as prior probabilities. In plainer words, if it were the case that 1% of the public were "medical professionals", and 99% of the public were not "medical professionals", then the base rate of medical professionals is simply 1%.

In science, particularly medicine, the base rate is critical for comparison. It may at first seem impressive that 1000 people beat their winter cold while using 'Treatment X', until we look at the entire 'Treatment X' population and find that the base rate of success is actually only 1/100 (i.e. 100 000 people tried the treatment, but the other 99 000 people never really beat their winter cold). The treatment's effectiveness is clearer when such base rate information (i.e. "1000 people... out of how many?") is available. Note that controls may likewise offer further information for comparison; maybe the control groups, who were using no treatment at all, had their own base rate success of 5/100. Controls thus indicate that 'Treatment X' actually makes things worse, despite that initial proud claim about 1000 people.

Taylor Russell Table The Taylor-Russell Tables help us see the connection between the validity of a test and the likelihood of the test resulting in a successful selection of a job candidate. Lets look at table below. It is actually three different tables. Each table is created around a different base rate of success. The concept of base rate of success simply means the percentage of people in a population that could successfully perform a job. In other words, if you were to simply randomly select people for a job, the base rate of success is the percentage that would be successful at the job.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


In the three tables below, the easiest job is the one with the highest base rate of success and the most difficult job is the one with the lowest base rate of success (although a base rate of success of 0.30 is still a non-skilled job).

Utility Theories and Decision Analysis Utility theory is used in decision analysis to determine the EU (estimated utility) of some action based on the U (utility) of its possible result(s). A Utility function U(W) maps from states (W being a world state) to real numbers i.e. U(W) = 5 Utility function is when there is a lottery (uncertainty involved), value function is when you're evaluating the worth of actual states. A value or fitness function for chess would be a winning game, but there is no uncertainty. A value for backgammon however is a utility function since the rolling of the dice brings in uncertainty, probability, and chance. An action can have more than one possible result: In the simple case an action's result is one deterministic state (i.e. lights are on or off). In this case the EU (estimated utility) of an action is equal to the U (utility) of its result.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

In reality there are usually a number of possible results for each possible action:

represents possible actions to take

represents possible resulting worlds

Where U(Wi) is the utility of Wi and p(Wi|Ak) is the probability of Wi resulting given Ak (from taking action Ak). Thus, the EU of an action Ak is the sum of the utilities of all its possible results times the probability of each result happening.

An action could also have one result with a continuous measure of degrees of intensity and a continuous probability distribution. To get the utility of such a result one would take the integral of its utility values as a function of its probability distribution. This is covered in terms of attributes below. It is presumed that the action with MEU (maximum expected utility) should be chosen. There is an issue of sequential actions. Each Wi or action result can have any number of attributes. Wij might be owning a car with j attributes: gas mileage, possible service costs, color, speed, etc. In some cases strict dominance of U(Wxj) over U(Wyj) can be asserted if the utility values of the j attributes are readily known and U(Wxj) > U(Wyj)

In most cases stochastic dominance must suffice since we usually don't know the exact values of all attributes before the actions take place. For example, possible service costs of owning a car depends on the probability of the car needing service at each possible cost. If we compare the actions of buying car 1 and buying car 2 where X represents the possible service costs p1(x) is the probability distribution of one car's possible service costs p2(x) is the probability distribution of the other car's possible service costs a is the least amount of money you can spend on service

b is the most you can spend on service, and

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


then buying car one stochastically dominates buying car 2. The integral tells us that every given probability of every possible service cost for car 1, taken together, is less than or equal to every given probability of every possible service cost for car 2, taken together. If an action A stochastically dominates all other actions on all attributes, then for any monotonically non decreasing utility function the expected utility of A is at least as high a the expected utility of all other actions.

Test Administration The Examiner And The Subject a) The relationship between examiner and test taker Behavior and relationship of both can effect test score

Half children were given test under enhanced rapport condition (examiner used friendly conversation and verbal reinforcement during test administration). Other half children took test under a neutral rapport condition (examiner neither initiated conversation nor used reinforcement). The examiner rapport had little effect on the score of younger children (3rd grade). Average IQ score (6th & 9th grade) were higher for those who had received the test under enhanced rapport condition than for those with a neutral administrator Examiner made approving comments (good or fine) and examiner used disapproving comments ( I thought you could better than that) Children who took the test under disapproving examiner received lower scores than children exposed to a neutral or an approving examiner

A familiar examiner may make a different to the younger children in score for test I.Q test score increase in children from lower socioeconomic class Familiarity with the test taker Preexisting notion about test takers ability Can either positively or negatively bias test result

Interviewer effects In attitudinal surveys, respondents may give the response that they perceive to be expected by the interviewer.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Interviewed by telephone, respondents might take their cues from sex and age of the interviewer People tend to disclose more information in a self-report format than they do an interviewer. People report more symptoms and health problems in a mailed questionnaire than they do in a face-to-face interviewer Computer administration is at least as reliable as traditional test administration

b) The Race of The Tester A little evidence that the race of the examiner sigficantly affects intelligence test scores. No differences between the children when they were given Stanfort-Binet by the African American examiner and by the white one. No significant in intelligence test score between African American and white influenced by having a trained African American or white examiner. Administering an IQ test are so specific. Strict procedure Well-trained African American and white administrators act almost identically Study show no effects Examiner effects tend to increase when examiners are given more discretion about the use of the test. The examiner were paraprofessionals rather than psychologists The white examiner obtained higher scores from white than from African American children Scores for both group children were comparable when tested by African American examiners African American children obtain higher test scores when the items were administered in the thematic mode Score lower on IQ test because poorer reading skills

c) Language of T some tests are inappropriate for people whose knowledge of the language is questionable Translating test is difficult, it cannot be assumed that the validity and reliability of the translation are comparable to the English version Test taker who are proficient in two or more language, the test should be given in the language that the test takers feel their best. Test interpreters can be bias into the testing situation Test Taker

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


d) Training of The Administrators Different assessment procedures require different levels of training Psychiatric diagnosis obtained using the Structured Clinical Interview for DSM-IV (SCID) Users are licensed psychiatrists or psychologists with additional training on the test No standardized protocol for training people to administer complicated tests such Wechsler Adult Intelligence Scale-Revised (WAIS-R) Although these tests are usually administered by licensed psychologists. e) Expectancy Effects data sometimes can be affected by what an experimenter expects to find Robert Rosenthal and his colleagues at Harvard University conducted experiments on Expectancy Effects called Rosenthal Effects. People that tend to fail if they were told that average response are fail the test. Some challenged: 1) claiming that they are based on unsound statistical procedures or faulty design 2) expectancy effect exists in some but not all situation Expectancies shape judgments in many important way Grand reviewers supposed to judge the quality of proposal independently but reviewers expectancies about the investigators do influence their judgment 2 aspects of expectancy effects relate tp the use of standardized tests: 1) Expectancy effect (Rosenthals experiment) were obtained when all the experimenters followed a standardized script from nonverbal communication between the experimenter and the subject not aware of his or her role in the process. 2) Expectancy effect has a small and subtle effect on scores and occurs in some situations but not in others Expectancy effect can impact intelligence testing in many ways such as scoring Graduate students with some training in intelligence testing tended to give more credit to responses purportedly from bright test takers A variety of interpersonal and cognitive process variable affect our judgment of others

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Physicians voice can affect the way patients rate their doctors f) Effects of Reinforcing Responses Reinforcement affects behavior Inconsistent use of feedback can damage the reliability and validity of test scores Incentives can help improve performance on IQ test for subgroups of children. 6 to 13 years old student receive token they can exchange for money Improved the performance of lower-class white children but not middle-class children or lower class African American children (Sweet, 1970) Children will work quite hard to obtain praise such as You are doing well (Eisenberger & Cameron, 1998) Effect of praise are strong as the effect of money or candy (Merrell, 1999) Girls increase their accuracy on the WISC block design subtle given any type of reinforcement for correct response Boys increased their accuracy only when given chips that could be exchanged for money Verbal praise boys increase speed, girls decrease speed

African American children do not respond as well to verbal reinforcement as they do when reinforcement was candy or money (Schultz & Sherman, 1976) Verbal reinforcement is not cultural relevent (Terrell, Taylor & Terrell, 1978) Respond to reinforcement such as Nice job, little Brother, Nice j ob , Blood culturally relevant The way an interviewer responds affects the content of responses in interview studies (Cannell & Henson, 1974) People reported more symptoms if they had been reinforced

Random reinforcement destroy the accuracy of performance and decrease the motivation to respond (Eisenberger & Cameron, 1998)

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Effect of random feedback are rather severe, causing depression, low motivation for responding and inability to solve problems known as learned helplessness (Abramsom, Alloy & Metalsky, 1995) Test administrators exert strict control over the use of the feedback

g)

Computer-Assisted Test Administration Easy access Presentation of the test items and automatic recording of test responses Advantages of computers: Excellence of standardization Individually tailored sequential administration Precision of timing responses Release of human testers for other duties Patience (test taker not rushed) Control bias

Advantages in test administration, scoring and interpretation including ease of application of complicated psychometric issues and the integration of testing and cognitive psychology

Computer are objective and cost-effective. Allow more experimental control than other method of administration Precise limit on the amount of time any one item can be studied Prevent test takers from looking ahead at other section of the test or going back to section already completed Ensures standardization and control and also reduces scoring errors Obtain sensitive information Students were less likely to disclose socially undesirable information during a personal interview than on a computer.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


More honest when tested by a computer than by a person At least as traditional assessment Computer-generated test reports in the hands of an inexperienced psychologist cannot replace clinical judgment. Can cause harm if misinterpreted.

h) Subject Variables Motivation and anxiety can greatly affect test score. College students suffer a serious debilitating condition known test anxiety Often have difficulty focusing attention on test items distracted by ( I am not doing well or I am running out of time) 3 components: worry, emotionality and lack of self-confidence illness affects test scores Not perform as well as when you feeling well Health status affect performance in behavior and in thinking Elderly may do better with individual testing session Normal hormonal variations affects test performance Healthy women experience variations in their perceptual and motor performance as a function on menstrual cycle Women may perform better on tests of speeded motor coordination than would during menstruation. Same women may perform more poorly on test of perceptual and spatial ability during mid-cycle than during menses Men also vary in test performance as a function of variations in sex hormones

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

Behavioral Assessment Methodology Measurement go beyond the application of psychological tests. Many assessment procedures involve the observation of behavior. In behavioral observation studies, the observer plays more active role in recording the data. Some problems include reactivity, drift and expectancies

a) Reactivity Observes the observers Reliability and accuracy are highest when someone is checking on the observers Called reactivity because it is a reaction to being checked Accuracy and the interrater agreement decrease when observers believers their work is not being checked Experimenter might randomly check on the performance of the observers without their knowledge. b) Drift Observers have a tendency to drift away from the strict rules they followed in training and to adopt idiosyncratic definitions of behavior Contrast effect tendency to rate the same behavior differently when observations are repeated in the same context. Frequent meetings to discuss method can reduce this effects

c) Expectancies administrator expectancies can affect scores on individual IQ tests, whereas other studies are not Behavioral observers will notice the behavior they expect Cause bias in the behavioral when observers receive reinforcement for recording a particular behavior than they do not Effect behavioral data

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


d) Statistical Control of Rating Errors The halo effect is the tendency to ascribe +ve attributes independently of the observed behavior This effect can be controlled through partial correlation in which the correlation between two variables is found while variability in a third variable is controlled

REFERENCE Allen, Mary J. and Yen, Wendy M. (1979). Introduction to Measurement Theory. Brooks/Cole Publishing Company. American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, D.C.: American Psychological Association. Association of Test Publishers. (2002). Guidelines for Computer-Based Testing. Brown, Frederick G. (1980). Guidelines for test use: A commentary on the standards for educational and psychological tests. National Council on Measurement in Education. Downing, Steven M., Haladyna, Thomas M. (Eds.) (2006) Handbook of Test Development. New Jersey. Lawrence Erlbaum Associates, Inc. Drasgow, Fritz (Eds.). and Olson-Buchana, Julie B. (Eds.). (1999). Innovations in computerized assessment. New Jersey. Lawrence Erlbaum Associates, Inc. Glaser, R., & Bond, L. (Eds.). (1981). Testing: Concepts, policy, practice and research (Special issue). American Psychologist, 36(10). Hambleton, R.K. (1980). Review methods for criterion-referenced test items. Paper presented at the annual meeting of the American Educational Research Association, Boston. Haladyna, Thomas M. (1997) Writing test items to evaluate higher order thinking. Allyn and Baron. Haladyna, Thomas M. (1999) Developing and Validating Multiple-choice Test Items. Lawrence Erlbaum Associates. Lord, Frederic M. and Novick, Melvin R. (1968). Statistical theories of mental test scores. United States of America. Addison-Wesley Publishing Company, Inc.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Nunnally, Jim C., Bernstein, Ira H., (1994). Psychometric Theory, (3rd ed.), McGraw-Hill, Inc Osterlind, Steven J. (1998). Constructing test items: Multiple-choice, constructed-response, performance, and other formats (2nd. ed.). Kluwer Academic Publishers, Norwell, MA

Presentation 2 : Projective tests of personality A test designed to reveal hidden emotions and internal conflicts via a subjects responses to ambiguous stimuli. Instead of being scored to a universal standard as with an objective personality test, content from projective tests is analyzed for meaning. Projective personality tests are supposed to be able to measure areas of your unconscious mind such as personality characteristic, fears, doubts and attitude. Some employers use these type of tests to try and see if you are an appropriate fit for their work environment. Francis Galton is the person who invented this method of testing. His first experiment was conducted in 1897 and consisted of choosing a selection of words and letting his mind free associate. He then took the words that he generated in reaction to the original list and put them into new classifications which led think more about the possibilities of subconsciousness and thought.

Strengths and Weaknesses of Projective Tests Projective tests are most frequently used in therapeutic settings. In many cases, therapists use these tests to learn qualitative information about a client. Some therapists may use projective tests as a sort of icebreaker to encourage the client to discuss issues or examine thoughts and emotions. While projective tests have some benefits, they also have a number of weaknesses and limitations. For example, the respondent's answers can be heavily influenced by the examiner's attitudes or the test setting. Scoring projective tests is also highly subjective, so interpretations of answers can vary dramatically from one examiner to the next.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Additionally, projective tests that do not have standard grading scales tend to lack both validity and reliability. Validity refers to whether or not a test is measuring what it purports to measure, while reliability refers to the consistency of the test results.

However, these tests are still widely used by clinical psychologists and psychiatrists. Some experts suggest that the latest versions of many projective tests have both practical value and some validity.

Rorschach inkblot test

The Rorschach inkblot test is a psychological projective test ofdo not try this at home! personality in which a subject's interpretations of ten standard abstract designs are analysed as a measure of emotional and intellectual functioning and integration. The test is named after Hermann Rorschach (1884-1922) who developed the inkblots, although he did not use them for personality analysis.

The test is considered "projective" because the patient is supposed to project his or her real personality into the inkblot via the interpretation. The inkblots are purportedly ambiguous, structure less entities which are to be given a clear structure by the interpreter. Those who believe in the efficacy of such tests think that they are a way of getting into the deepest recesses of the patient's psyche or subconscious mind. Those who give such tests believe themselves to be experts at interpreting their patients' interpretations.

What evidence is there that an interpretation of an inkblot (or a picture drawing or sample of handwriting--other items used in projective testing) issues from a part of the self that reveals true feelings, rather than, say, creative expression? What justification is there for assuming that any given interpretation of an inkblot does not issue from a part of the selfbent on deceiving others, or on deceiving oneself for that matter? Even if the interpretations issued from a part of the self which expresses desires, it is a long jump from having desires to having committed actions. For example, an interpretation may unambiguously express the desire to have sex with the therapist, but that does not imply either that the patient has had

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


sex with the therapist or that the patient, if given the opportunity, would agree to have sex with the therapist.

Rorschach testing is inherently problematic. For one thing, to be truly projective the inkblots must be considered ambiguous and without structure by the therapist. Hence, the therapist must not make reference to the inkblot in interpreting the patient's responses or else the therapist's projection would have to be taken into account by an independent party. Then the third person would have to be interpreted by a fourth ad infinitum. Thus, the therapist must interpret the patient's interpretation without reference to what is being interpreted. Clearly, the inkblot becomes superfluous. You might as well have the patient interpret spots on the wall or stains on the floor. In other words, the interpretation must be examined as if it were a story or dream with no particular reference in reality. Even so, ultimately the therapist must make a judgment about the interpretation, i.e., interpret the interpretation. But again, who is to interpret the therapist's interpretation? Another therapist? Then, who will interpret his? etc.

To avoid this logical problem of having a standard for a standard for a standard, etc., the experts invented standardized interpretations of interpretations. Both form and content are standardized. For example, a patient who attends only to a small part of the blot is "indicative of obsessive personality;" while one who sees figures which are half-human and half-animal indicates that he is alienated, perhaps on the brink of schizophrenic withdrawal from people (Dawes, 148). If there were no standardized interpretations of the interpretations, then the same interpretations by patients could be given equally valid but different interpretations by therapists. What empirical tests have been done to demonstrate that any given interpretation of an inkblot is indicative of any past behaviour or predictive of any future behaviour? In short, interpreting the inkblot test is about as scientific as interpreting dreams.

To have any hope of making the inkblot test appear to be scientifically valid, it was essential that it be turned into a non-projective test. The blots can't be considered completely formless, but must be given a standard response against which the interpretations of patients are to be compared as either good or bad responses. This is what John E. Exner did. The Exner System uses inkblots as a standardized test. On its face, the concept seems preposterous. Imagine admitting people into med school on the basis of such a standardized

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


test! Or screening candidates for the police academy! ("I didn't get in because I failed the inkblot test.").

The Rorschach enthusiast should recognize that inkblots or dreams or drawings or handwriting may be no different in structure than spoken words or gestures. Each is capable of many interpretations, some true, some false, some meaningful, some meaningless. It is an unprovable assumption that dreams or inkblot interpretations issue from a source deep in the subconscious which wants to reveal the "real" self. The mind is a labyrinth and it is a pipe dream to think that the inkblot is Ariadne's thread which will lead the therapist to the centre of the patient.

Precautions/Limitations with Rorschach Results are poorly verifiable. Declining adherence to the Freudian principle of repression on which the test is based. Loren and Jean Chapman (1960s) Found responses thought to be indicative of homosexuality were just as likely to be given by heterosexual males. Students told the male might be a homosexual were more likely to read through inkblot responses and interpret them as such when they were arbitrary developed. Illusory correlation- the phenomenon of seeing the relationship one expects in a set of data even when no such relationship exists. Subject to bias (countertransference)- Ex: If the individual comes from lower socioeconomic background or different race, will the clinician tend to see inkblot response as more negative if they happen to be prejudiced? In other words, will the clinician project his or her own unconscious impulses about the test-taker on the analysis? Over pathologize normal individuals- Rorschach identifies half of all test-takers as possessing distorted thinking, a false positive rate unexplained by current research It is also thought that the test's reliability can depend substantially on details of the testing procedure : Where the tester and subject are seated introductory words verbal and

non-verbal responses to subjects' questions or comments how responses are recorded.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Procedures for coding responses while fairly well specified are extremely timeconsuming to inexperienced examiners, who may cut corners as a result . Exner has published detailed instructions, but Wood et al(2003) cites many court cases where these have not been followed Exner's system was thought to possess normative scores for various populations. But, beginning in the mid-1990s others began to try to replicate or update these norms and failed. In particular, discrepancies seemed to focus on indices measuring narcissism, disordered thinking, and discomfort in

Reliability and Validity of Rorschach Inkblot Test

Reliability depends on the ability to achieve a given measurement consistently (Weiner & Greene, 2008). Viglione and Taylor (2003) specifically examined this issue using the Comprehensive System. They reported that in their own study, among 84 raters

evaluating 70 Rorschach variables, there was a strong inter-rater reliability, particularly for the base-rate variables. They also reviewed 24 previously published papers, all reporting various inter-rater reliabilities. Most of these studies reported reliabilities in the range of 85% to 99%. Aside from inter-rater reliability, test-retest reliability is another important consideration. Exner (as cited in Groth-Marnat, 2009, pp. 389-90) reported reliabilities from .26 to .92 over a 1-year interval considering 41 variables; four of them were above .90, 25 between .81 and .89, and 10 below .75. However, the most unreliable variables were

attributed to state changes. It was further noted that the most relied upon factors, ratios and percentages, were among the most reliable. Therefore, it can be concluded that the

Comprehensive System can yield high reliability when used under the conditions applied in these studies.

Validity depends on the ability of a test to measure the constructs that it is purported to measure (Wiener & Greene, 2008). Validity in this case can be evaluated by comparing

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


the Rorschach with clinical data or with other established tests of personality. Weiner (2001), for example, stated that the Rorschach has a validity effect size almost identical to the MMPI (Weiner, 2001, p. 423). Groth-Marnat (2009, p. 391) has pointed out that results of validity studies on the Rorschach have been mixed, but are confounded by various factors including the type of scoring system, experience of the scorer, and type of population. Early studies produced validity scores of .40 to .50, but later studies found scores as low as 0.29. However, such studies were further confounded by variables such as age, number of responses, verbal aptitude, education, and other confounding factors that were not controlled.

More recent studies of validity have met with mixed results. Smith et al. (2010) evaluated the validity of the Rorschach in assessing the effects of trauma using a different system, the Logical Rorschach developed by Wagner (2001, as cited in Smith et al., 2010). They found equivocal findings, but indicated that the LR may have some validity in the assessment of trauma-related phenomena. Wood et al. (2010) evaluated the Rorschach

using a meta-analysis of 22 studies including 780 forensic subjects, in an attempt to separate psychopaths from non-psychopaths. They reported a mean validity coefficient of 0.062 using all variables, and a validity of 0.232 using the Aggressive Potential index. They concluded that their findings contradict the view that the Rorschach is a clinically sensitive instrument for discriminating psychopaths from non-psychopaths. (Wood et al., 2010, p. 336). Another result was reported by Lindgren, Carlsson, and Lundback (2007) in which they found no agreement between the Rorschach and a self-assessed personality using the MMPI-2.

Advantages of Rorschach inkblot test Higher inter-rater reliability through the Exner scoring system (1960s)- a standard method used for interpreting the test Substantial evidence scores are related to and can identify thought/ psychotic disorders. Responses given by those with Schizophrenia and Bi-Polar disorder (manic phase), Schizotypal personality disorder, have poor form quality (they do not fit the shape of the inkblots). The disorganized thoughts and peculiarities of language of schizophrenics can be seen in in interpretation Several scores correlate well with general intelligence: # of responses , how detailed and creative response is

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Elizur Anxiety and Hostility scales (based on the emotional content of patients responses)- have a well-demonstrated relationship to anxious and hostile behaviours The Rorschach Oral Dependency scale (ROD), based on responses that involve eating, mouths, or other oral imagery, appears to be a valid measure of normal variations in dependency

Thematic Apperception Test

The Thematic Apperception Test, or TAT, is a projective measure intended to evaluate a person's patterns of thought, attitudes, observational capacity, and emotional responses to ambiguous test materials. In the case of the TAT, the ambiguous materials consist of a set of cards that portray human figures in a variety of settings and situations. The subject is asked to tell the examiner a story about each card that includes the following elements: the event shown in the picture; what has led up to it; what the characters in the picture are feeling and thinking; and the outcome of the event.

Because the TAT is an example of a projective instrument that is, it asks the subject to project his or her habitual patterns of thought and emotional responses onto the pictures on the cards many psychologists prefer not to call it a "test," because it implies that there are "right" and "wrong" answers to the questions. They consider the term "technique" to be a more accurate description of the TAT and other projective assessments.

Purpose Individual assessments The TAT is often administered to individuals as part of a battery, or group, of tests intended to evaluate personality. It is considered to be effective in eliciting information about a person's view of the world and his or her attitudes toward the self and others. As people taking the TAT proceed through the various story cards and tell stories about the pictures, they reveal their expectations of relationships with peers, parents or other authority figures, subordinates, and possible romantic partners. In addition to assessing the content of the stories that the subject is telling, the examiner evaluates the subject's manner, vocal tone, posture, hesitations, and other signs of an emotional response to a particular story picture.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


For example, a person who is made anxious by a certain picture may make comments about the artistic style of the picture, or remark that he or she does not like the picture; this is a way of avoiding telling a story about it.

The TAT is often used in individual assessments of candidates for employment in fields requiring a high degree of skill in dealing with other people and/or ability to cope with high levels of psychological stress such as law enforcement, military leadership positions, religious ministry, education, diplomatic service, etc. Although the TAT should not be used in the differential diagnosis of mental disorders, it is often administered to individuals who have already received a diagnosis in order to match them with the type of psychotherapy best suited to their personalities. Lastly, the TAT is sometimes used for forensic purposes in evaluating the motivations and general attitudes of persons accused of violent crimes. For example, the TAT was recently administered to a 24-year-old man in prison for a series of sexual murders. The results indicated that his attitudes toward other people are not only outside normal limits but are similar to those of other persons found guilty of the same type of crime. The TAT can be given repeatedly to an individual as a way of measuring progress in psychotherapy or, in some cases, to help the therapist understand why the treatment seems to be stalled or blocked.

Research In addition to its application in individual assessments, the TAT is frequently used for research into specific aspects of human personality, most often needs for achievement, fears of failure, hostility and aggression, and interpersonal object relations. "Object relations" is a phrase used in psychiatry and psychology to refer to the ways people internalize their relationships with others and the emotional tone of their relationships. Research into object relations using the TAT investigates a variety of different topics, including the extent to which people are emotionally involved in relationships with others; their ability to understand the complexities of human relationships; their ability to distinguish between their viewpoint on a situation and the perspectives of others involved; their ability to control aggressive impulses; self-esteem issues; and issues of personal identity. For example, one recent study compared responses to the TAT from a group of psychiatric inpatients diagnosed with dissociative disorders with responses from a group of non-dissociative inpatients, in order to

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


investigate some of the controversies about dissociative identity disorder (formerly called multiple personality disorder).

Precautions Students in medicine, psychology, or other fields who are learning to administer and interpret the TAT receive detailed instructions about the number of factors that can influence a person's responses to the story cards. In general, they are advised to be conservative in their interpretations, and to err "on the side of health" rather than of psychopathology when evaluating a subject's responses. In addition, the 1992 Code of Ethics of the American Psychological Association requires examiners to be knowledgeable about cultural and social differences, and to be responsible in interpreting test results with regard to these differences.

Experts in the use of the TAT recommend obtaining a personal and medical history from the subject before giving the TAT, in order to have some context for evaluating what might otherwise appear to be abnormal or unusual responses. For example, frequent references to death or grief in the stories would not be particularly surprising from a subject who had recently been bereaved. In addition, the TAT should not be used as the sole examination in evaluating an individual; it should be combined with other interviews and tests.

Cultural, gender, and class issues The large number of research studies that have used the TAT have indicated that cultural, gender, and class issues must be taken into account when determining whether a specific response to a story card is "abnormal" strictly speaking, or whether it may be a normal response from a person in a particular group. For example, the card labeled 6GF shows a younger woman who is seated turning toward a somewhat older man who is standing behind her and smoking a pipe. Most male subjects do not react to this picture as implying aggressiveness, but most female subjects regard it as a very aggressive picture, with unpleasant overtones of intrusiveness and danger. Many researchers consider the gender difference in responses to this card as a reflection of the general imbalance in power between men and women in the larger society.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Race is another issue related to the TAT story cards. The original story cards, which were created in 1935, all involved Caucasian figures. As early as 1949, researchers who were administering the TAT to African Americans asked whether the race of the figures in the cards would influence the subjects' responses. Newer sets of TAT story cards have introduced figures representing a wider variety of races and ethnic groups. As of 2002, however, it is not clear whether a subject's ability to identify with the race of the figures in the story cards improves the results of a TAT assessment.

Multiplicity of scoring systems One precaution required in general assessment of the TAT is the absence of a normative scoring system for responses. The original scoring system devised in 1943 by Henry Murray, one of the authors of the TAT, attempted to account for every variable that it measures. Murray's scoring system is time-consuming and unwieldy, and as a result has been little used by later interpreters. Other scoring systems have since been introduced that focus on one or two specific variablesfor example, hostility or depression. While these systems are more practical for clinical use, they lack comprehensiveness. No single system presently used for scoring the TAT has achieved widespread acceptance. The basic drawback of any scoring system in evaluating responses to the TAT story cards is that information that is not relevant to that particular system is simply lost.

Computer scoring A recent subject of controversy in TAT interpretation concerns the use of computers to evaluate responses. While computers were used initially only to score tests with simple yes/no answers, they were soon applied to interpretation of projective measures. A computerized system for interpreting the Rorschach was devised as early as 1964. As of 2002, there are no computerized systems for evaluating responses to the TAT; however, users of the TAT should be aware of the controversies in this field. Computers have two basic limitations for use with the TAT: the first is that they cannot observe and record the subject's vocal tone, eye contact, and other aspects of behavior that a human examiner can note. Second, computers are not adequate for the interpretation of unusual subject profiles.

Description

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


The TAT is one of the oldest projective measures in continuous use. It has become the most popular projective technique among English-speaking psychiatrists and psychologists, and is better accepted among clinicians than the Rorschach.

History of the TAT The TAT was first developed in 1935 by Henry Murray, Christiana Morgan, and their colleagues at the Harvard Psychological Clinic. The early versions of the TAT listed Morgan as the first author, but later versions dropped her name. One of the controversies surrounding the history of the TAT concerns the long and conflict-ridden extramarital relationship between Morgan and Murray, and its reinforcement of the prejudices that existed in the 1930s against women in academic psychology and psychiatry.

It is generally agreed, however, that the basic idea behind the TAT came from one of Murray's undergraduate students. The student mentioned that her son had spent his time recuperating from an illness by cutting pictures out of magazines and making up stories about them. The student wondered whether similar pictures could be used in therapy to tap into the nature of a patient's fantasies.

Administration The TAT is usually administered to individuals in a quiet room free from interruptions or distractions. The subject sits at the edge of a table or desk next to the examiner. The examiner shows the subject a series of story cards taken from the full set of 31 TAT cards. The usual number of cards shown to the subject is between 10 and 14, although Murray recommended the use of 20 cards, administered in two separate one-hour sessions with the subject. The original 31 cards were divided into three categories, for use with men only, with women only, or for use with subjects of either sex. Recent practice has moved away from the use of separate sets of cards for men and women.

The subject is then instructed to tell a story about the picture on each card, with specific instructions to include a description of the event in the picture, the developments that led up to the event, the thoughts and feelings of the people in the picture, and the outcome of the story. The examiner keeps the cards in a pile face down in front of him or

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


her, gives them to the subject one at a time, and asks the subject to place each card face down as its story is completed. Administration of the TAT usually takes about an hour.

Recording Murray's original practice was to take notes by hand on the subject's responses, including his or her nonverbal behaviors. Research has indicated, however, that a great deal of significant material is lost when notes are recorded in this way. As a result, some examiners now use a tape recorder to record subjects' answers. Another option involves asking the subject to write down his or her answers.

Interpretation There are two basic approaches to interpreting responses to the TAT, called nomothetic and idiographic respectively. Nomothetic interpretation refers to the practice of establishing norms for answers from subjects in specific age, gender, racial, or educational level groups and then measuring a given subject's responses against those norms. Idiographic interpretation refers to evaluating the unique features of the subject's view of the world and relationships. Most psychologists would classify the TAT as better suited to idiographic than nomothetic interpretation.

In interpreting responses to the TAT, examiners typically focus their attention on one of three areas: the content of the stories that the subject tells; the feeling or tone of the stories; or the subject's behaviours apart from responses. These behaviours may include verbal remarks (for example, comments about feeling stressed by the situation or not being a good storyteller) as well as nonverbal actions or signs (blushing, stammering, fidgeting in the chair, difficulties making eye contact with the examiner, etc.) The story content usually reveals the subject's attitudes, fantasies, wishes, inner conflicts, and view of the outside world. The story structure typically reflects the subject's feelings, assumptions about the world, and an underlying attitude of optimism or pessimism.

Results The results of the TAT must be interpreted in the context of the subject's personal history, age, sex, level of education, occupation, racial or ethnic identification, first language, and

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


other characteristics that may be important. "Normal" results are difficult to define in a complex multicultural society like the contemporary United States.

Advantages of Thematic Apperception Test (TAT) Murray stated that without exception, every person who participated in the study injected aspects of their personalities into their stories. Especially useful for children who can utilize pictures to tell a story about their emotions/internal conflicts as they can have difficultly expressing themselves directly with words. Especially useful in psychotherapy, where discussions can be made about the theme of certain stories the client gives that might not have been within the clients current awareness. Subject cannot figure out how their response will be interpreted so it is difficult to fake a response

Rotter Incomplete Sentences Blank Purpose Of Risb The Rotters incomplete sentence blank is an attempt to standardize the sentence completion method for the use at college level. Forty stems are completed by the subject. These completions are then scored by comparing them against typical items in empirically derived scoring manuals for men and women and by assigning to each response a scale value from 0to6. The total score is an index of maladjustment

The Sentence Completion Method The sentence completion method of studying personality is a semi structured projective technique in which the subject is asked to finish a sentence for which the first word or words are supplied. As in other projective devices, it is assumed that the subject reflects his own wishes, desires, fears and attitudes in the sentences he makes.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Historically, the incomplete sentence method is related most closely to the word association test. In some test incomplete sentences tests only a single word or brief response is called for; the major differences appears to be in the length of the stimulus. In the sentence completion tests, tendencies to block and to twist the meaning of the stimulus words appear and the responses may be categorized in a somewhat similar fashion to the word association method.

Development Of Isb The Incomplete Sentence Blank consists of forty items revised from a form used by Rotter and Willermann in the army. This form was, in turn, a revision of blanks used by Shor, Hutt , and Holzberg at the Mason General Hospital.

In the development of the ISB, two objectives were kept in mind. One aim was to provide a technique which could be used objectively for screening and experimental purposes. It was felt that this technique should have at least some of the advantages of projective methods, and also be economical from the point of view of administration and scoring. A second goal was to obtain information of rather specific diagnostic value for treatment purposes.

The Incomplete Sentence Blank can be used, of course, for general interpretation with a variety of subjects in much the same manner that a clinician trained in dynamic psychology uses any projective material. However, a feature of ISB is that one can derive a single over-all adjustment score. This over-all adjustment score is of particular value for screening purposes with college students and in experimental studies. The ISB has also been used in a vocational guidance center to select students requiring broader counseling than was usually given, in experimental studies of the effect of psychotherapy and in investigations of the relationship of adjustment to a variety of variables.

Psychometric Properties 1. RELIABILITY

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Since the items on an incomplete sentence blank are not equivalent, the odd even technique for determining reliability is not applicable and would tend to give minimum estimate of internal consistency. Therefore items on the ISB were divided into two halves deemed as nearly equivalent as possible. This yielded a corrected split-half reliability of .84 when based on the records of 124 male college students, and .83 when based on 71 female students. Inter-scorer reliability for two scorer trained by the authors was .91 when based on male records and .96 for female records.

2.

VALIDITY The Incomplete Sentence Blank was validated on groups of subjects which did not

include any of cases used in developing the scoring principles and the scoring manuals. Scoring of the blanks was done blindly the scorer never knew whether the test blank was supposed to be that of a maladjusted or an adjusted subject.

Validity data were obtained for the two sexes separately since the scoring manuals differ. The subjects include 82 females and 124 males who were classified as either adjusted or maladjusted i.e., as needing personal counselling or as not needing such counselling. A cutting score of 135 provided a very sufficient separation of adjusted and maladjusted students in the data collected above

3.

NORMS A distribution of scores on the ISB for a representative college freshman population

was obtained by giving the Incomplete Sentences Blank to 299 entering freshman at Ohio.

State University. A comparison between the median percentile ranks on the Ohio State Psychological Examination of the sample and of the total freshman population showed a difference of approximately two percentile points. The agreement between corresponding first and third quartile points was very close. It was interesting to find that the correlation coefficient between the Ohio State Psychological Examination scores and ISB scores for the selected freshman sample was only .11. This is in accord with a general feeling that a very little relationship would exist between intelligence and scores on the personality measure such as the Incomplete Sentence Blank

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

Scoring THE USE OF SCORING MANUAL Sentence completions are used from examples in the scoring manuals by assigning a numerical weight from 0 to 6 for each sentence and totaling the weights to obtain the overall score. The scoring examples in part II of this manual are given to facilitate the assignment of weights to responses. They are from ISB responses of 58 male and 53 female college students, ranging from extremely well adjusted person to those judged to be in need of psychotherapy. Since the scoring examples are illustrative and representative of common responses with no intent to list all possible sentence completions, a set of scoring principles will be presented. These principles are intended to aid in determine the correct weight for a completion when a very similar statement cannot be found in the scoring examples.

In order to provide the potential user of the ISB with supervised experience before attempting to score clinical or experimental records. The correct scoring for these records is given at the end. These examples will enable the clinician to check his scoring against that of the authors. They may also use by a clinic supervisor to check the scoring ability of any student or general scorer. Sentence completion is used for illustrative purposes in the following discussion are taken almost entirely from the manual.

SCORING PRINCIPLES OMISSION RESPONSE Omission responses are designated as those for which no answer is given or for which the thought is incomplete. Omissions and fragments are not scored. It is recognized that in a clinical situations are occasionally provocative since they may point to areas which the individual does not recognize or cannot bring himself to express. For all responses which are subsumed under the heading of incomplete thoughts or omissions, no scoring is made. After the remainder of responses is prorated by the formula {40 / (40-omissions)} times the total scores however, if there are more than 20 omissions, the paper is considered unscorable for all practical purposes.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


For example, Most girls . . . dont appeal to me except sexually because; or I hate . . . the thought of going home since

CONFLICT RESPONSES C or conflict, responses are those indicating an un healthy or maladjusted frame of mind. These include hostility reactions, pessimism, symptom elicitation, hopelessness and suicidal wishes, statements of unhappy experiences, and indications of past maladjustment. Responses range from C1 to C3 according to the severity of the conflict or maladjusted expressed. The numerical weights for the conflict responses are C1=4 C2=5 C3=6 Typical of the C1 category are responses in which concern is expressed regarding such things as the world state of affairs, financial problems, specific school difficulties, physical complaints, identification with minority groups, and so on. In general it might be said that subsumed under C1 are minor problems which are not deep-seated or incapacitating, and more or less specific difficulties. More serious indications of maladjustment are found in the C2 category. On the whole the responses refer to broader, more generalized difficulties than are found in C1. I Included here are expressions of inferiority feelings, psychosomatic complaints, concern over possible failure, generalized school problems, lack of goals, feeling of inadequacy, concern over vocational choice, and difficulty in heterosexual relationships as well as generalized social difficulty. Expression of severe conflict or indications of maladjustments are rated C3. Among the difficulties found in this area are suicidal wishes, sexual conflicts, severe family problems, fear of insanity, strong negative attitudes toward people in general, feelings of confusion, expression of rather bizarre attitudes, and so forth. For example, I like . . . to know if I am crazy. This type of response will lie in C3 category. The happiest time . . . is over and this type of response will lie in C2 category. I want to know . . . about life, this type of response will lie in C1 category

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


POSITIVE RESPONSES P or positive responses are those indicating a healthy or hopeful frame of mind. These are evidence by humorous or flippant remarks, optimistic responses, and acceptance reactions. Responses range from P1to P3 depending on the degree of good adjustment expressed in the statement. The numerical weights for the positive responses are P1= 2 P2=1 P3=0 In the P1 class common responses are those which deal with positive attitudes toward school, hobbies, sports, expression interest in people, expression of warm feeling toward some individual and so on. Generally found under the heading of P2 are those replies which indicate a generalized positive feeling toward people, good social adjustment, healthy family life, optimism and humour. Clear cut good natured humour, real optimism, and warm acceptance are types of responses which are subsumed under the P3 group. The ISB deviates from the majority of the test in that it scores humorous responses. For example, I like . . . to have good time, this type of response will lie in P1 category. The happiest time . . . is yet to come, this type of response will lie in P2 category. Back home . . . are many friends, this type of response will lie in P3 category.

NEUTRAL RESPONSES N or neutral responses are those not falling clearly into either of the above categories. They are generally on a simple descriptive level. Two general types of responses which account for a large share of those that fall in the neutral category. One group includes those lacking emotional tone or personal reference. The other group is composed of many responses which are found as often among maladjusted as among adjusted individual and through clinical judgment could not be legitimately place in either C or P group. All the N responses are scored 3.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


For example, Most girls . . . are females or When I was child . . . I spoke as a child. These types of responses will lie in neutral responses.

Independent Scoring Of Items Each response is to be scored and evaluated independently of all others, except when it is clear-cut reference to a previous statement. It is, of course, important in the scoring of any papers to avoid the halo effect as much as possible so that the measurement can be reliable. This is equally necessary here for, if each response is not scored independently of all others, there is a tendency to rate all responses in light of the over-all picture. In some cases a response refers directly to a previous item, and it would not be reasonable to score it independently of the first. In such an instance, therefore, a previous response must be used in the evaluation of the later one. For example, I wish . . . he were dead in one record had reference to the preceding sentence when the individual said, the only trouble . . . is I wish I could forget Ill be like my father. Another instance is, I secretly . . . blame my mother, which refers to a precedent, My father . . . was a suicide.

Qualification Responses which start like an example in the manual but are differently qualified are scored with a consideration of these qualifications. For example, it may be seen that the following responses should be scored higher than if they had not been qualified. Sports . . . I have always liked, yet they dont hold my interest like they did. Or This school . . . is o.k., but its too close to home. There are also responses which will be given lower ratings than they would get without the qualification. Common among these are responses given by individuals subsequent to therapy. The future . . . is uncertain, but I think I can lick it. Or Back home . . . life was pretty miserable, but I think I can cope with the situation now. Such qualifications may change the weighting of the response by one or more points.

Exteme Wieghts

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


In cases when a response seems to be more extreme than the examples cited, then it is permissible to use an extreme weight. These weights may be assigned, however, if clearly warranted. In cases when a response seems to be more extreme than the examples cited, then it is permissible to use an extreme weight. If the following responses were given they would be scored 6, although there are no examples listed for these items. Sports . . . should not be allowed for mixed groups because they are too stimulation. Or Reading . . . is one thing I hate.

Unusually Long Responses In cases where the response is unusually long, it should be given an additional point in the direction of C unless it has already been rated 6. It has been found that the maladjusted individual often writes long involved sentences as if compelled to express himself fully and not misunderstood. On the other hand well-adjusted person frequently replies to the stimuli with short concise statements. For example, one poorly adjusted individual wrote, I am best when . . . I am under no pressure of responsibility concerning the accomplishment of a given thing within a certain specified time. An adjusted person wrote, I am best when . . . Im having a party. This does not seem to be a function of intelligence as might be hypothesized. The previous responses were from two superior intelligence. The following are reactions of two individuals of lesser ability, the maladjusted students wrote, I like . . . agriculture. A well-adjusted individual wrote simply, I like . . . people. The only exception to this rule concerns neutral completions. If the response is a common quotation, stereotype or song title, it is always scored as neutral, regardless of length.

Advantages of RISB The general advantages of the sentence completion method can be summarized as follows There is freedom of response. That is, the subject is not forced to answer yes or no or? to the examiner question. He may instead, in any way he desires. Some disguise in the purpose of the test is present. Although the subject made aware of general intend, what constitutes a good or bad answer is not readily apparent to most subjects.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Group administration is relatively efficient. Most incomplete sentences tests can be given to a group of any size without apparent loss of validity. No special training is ordinarily necessary for administration. Interpretation depends on the examiners general clinical experience, although the examiner does not need specific training in the use of this method. The method is extremely flexible in that new sentence beginnings can be constructed or tailor made for a variety of clinical, applied and experimental purposes Disadvantages of RISB Although susceptible to semi-objective scoring, it cannot be machine scored and requires general skill and knowledge of personality analysis for clinical appraisal and interpretation. There is not as much disguise of purpose as in other projective methods. Consequently, a sophisticated subject may be able to keep the examiner from knowing what he does not wish to reveal. Insufficient material is obtained in some cases, particularly from illiterate, disturbed or uncooperative subjects. Application of the method as a group test also requires writing and language skills and has not yet been adequately evaluated for potential clinical usefulness for younger children.

Draw A Person Test Introduction Developed originally by Florence Goodenough in 1926, this test was first known as the Goodenough Draw-A-Man test. It is detailed in her book titled Measurement of Intelligence by Drawings. Dr. Dale B. Harris later revised and extended the test and it is now known as the Goodenough-Harris Drawing Test. The revision and extension is detailed in his book Children's Drawings as Measures of Intellectual Maturity (1963). Psychologist Julian Jaynes, in his 1976 book The Origin of Consciousness in the Breakdown of the Bicameral Mind, wrote that the test is "routinely administered as an indicator of schizophrenia," and that while not all schizophrenic patients have trouble drawing a person, when they do, it is very clear evidence of a disorder. Specific signs could include a patient's neglect to include "obvious anatomical parts like hands and eyes," with "blurred and unconnected lines," ambiguous

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


sexuality and general distortion. There has been no validation of this test as indicative of schizophrenia. Chapman and Chapman (1969), in a classic study of illusory correlation, showed that the scoring manual, e.g., large eyes as indicative of paranoia, could be generated from the naive beliefs of undergraduates.

History The official beginning of when figure drawing was first thought to be associated with personality is unknown. Whether it was the drawing on a cave wall, a painting by a great artist, or a doodle made by an average person, the curiosity somehow came about. However, the formal beginning of its use for psychological assessment is known to begin with Florence Goodenough, a child psychologist, in 1926 (Scott, 1981).

Goodenough first became interested in figure drawing when she wanted to find a way to supplement the Stanford-Binet intelligence test with a nonverbal measure. The test was developed to assess maturity in young people. She concluded that the amount of detail involved in a childs drawing could be used as an effective tool. This led to the development of the first official assessment using figure drawing with her development of the Draw-A-Man test. Over the years, the test has been revised many times with added measures for assessing intelligence (Weiner & Greene, 2008). Harris later revised the test including drawings of a woman and of themselves. Now considered the Goodenough-Harris Test it has guidelines for assessing children from ages 6 to 17 (Scott, 1981).

Soon after the development of the test, psychologists started considering the test for measures of differences in personality as well as intelligence. In 1949, Karen Machover developed the first measure of figure drawing as a personality assessment with the Draw A Person Test (Machover, 1949).

Machover did a lot of work with disturbed adolescents and adults and used the test to assess people of all ages. She wrote a book on her measure expressing that the features of the figures drawn reflect underlying attitudes, concerns, and personality traits. In her test, she included a suggestion to ask about the person they have drawn. She advises to ask them to tell the administrator a story about the figure as if they were in a novel or play. Machover used a qualitative approach in her interpretation considering individual drawing

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


characteristics (Machover, 1949). Others have since suggested a more quantitative approach that can be more widely used analyzing selected characteristics that are in an index of deeper meanings (Murstein, 1965).

The most popular quantitative approach was developed by Elizabeth Koppitz. Koppitz developed a measure of assessment that has a list of emotional indicators including size of figures, omission of body parts, and some special features. The total number of the indicators is simply added up to provide a number that represents the likeliness of disturbance (Murstein, 1965).

With the Draw a Person test as a base, a number of other tests have developed using figure drawing as a personality assessment tool. For example, the House-Tree-Person test similarly just asks the person to draw those three objects and then inquires about what they have drawn. The questions asked for inquiry include what kinds of activities go on in the house, what are the strongest parts of the tree, and what things make the person angry or sad. The KFD (Kinetic Family Drawing) tells the drawer to draw their family doing something (Murstein, 1965).

All of these tests have the important element of not only the assessment of the pictures themselves, but also the thematic variables involved. Every figure drawing test asks the drawer to include some kind of description or interpretation of what is happening in the picture. These elements are also analyzed accordingly (Weiner & Greene,2008).

Nature Of The Test Test administration involves the administrator requesting children to complete three individual drawings on separate pieces of paper. Children are asked to draw a man, a woman, and themselves. No further instructions are given and the child is free to make the drawing in whichever way he/she would like. There is no right or wrong type of drawing, although the child must make a drawing of a whole person each time i.e. head to feet, not just the face. The test has no time limit; however, children rarely take longer than about 10 or 15 minutes to complete all three drawings. Harris's book (1963) provides scoring scales which are used to examine and score the child's drawings. The test is completely noninvasive and non-threatening to children, which is part of its appeal.

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

To evaluate intelligence, the test administrator uses the Draw-a-Person: QSS (Quantitative Scoring System). This system analyzes fourteen different aspects of the drawings (such as specific body parts and clothing) for various criteria, including presence or absence, detail, and proportion. In all, there are 64 scoring items for each drawing. A separate standard score is recorded for each drawing, and a total score for all three. The use of a nonverbal, nonthreatening task to evaluate intelligence is intended to eliminate possible sources of bias by reducing variables like primary language, verbal skills, communication disabilities, and sensitivity to working under pressure. However, test results can be influenced by previous drawing experience, a factor that may account for the tendency of middle-class children to score higher on this test than lower-class children, who often have fewer opportunities to draw. To assess the test-taker for emotional problems, the administrator uses the Draw-a-Person: SPED (Screening Procedure for Emotional Disturbance) to score the drawings. This system is composed of two types of criteria. For the first type, eight dimensions of each drawing are evaluated against norms for the child's age group. For the second type, 47 different items are considered for each drawing.

The purpose of the test is to assist professionals in inferring children's cognitive developmental levels with little or no influence of other factors such as language barriers or special needs. Any other uses of the test are merely projective and are not endorsed by the first creator.

Advantages of Draw A Person Test: Easy to administer (only about 20-30 minutes plus 10 minutes of inquiry) Helps people who have anxieties taking tests (no strict format) Can assess people with communication problems Relatively culture free Allow for self-administration

Disadvantages of Draw A Person Test: Restricted amount of hypotheses can be developed Relatively non-verbal, but may have some problems during inquiry Little research backing

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516

REFERENCE Projective Methods for Personality Assessment. (n.d.). Retrieved November 21, 2012, from http://www.neiu.edu/~mecondon/proj-lec.htm. Cordn, Luis A. (2005). Popular psychology: an encyclopedia. Westport, Conn: Greenwood Press. pp. 201204. ISBN 0-313-32457-3. Gamble, K. R. (1972). The holtzman inkblot technique. Psychological Bulletin, 77(3), 172194. doi:10.1037/h0032332 Projective Tests. (n.d.) Retrieved November 21, 2012 from

http://web.psych.ualberta.ca/~chrisw/L12ProjectiveTests/L12ProjectiveTests.pdf Piotrowski, Z. (1958-01-01). The Tomkins-Horn Picture Arrangement Test. The journal of nervous and mental disease, 126(1), 106. doi:10.1097/00005053-195801000-00016 Merriam-Webster. (n.d.). Retrieved November 21, 2012, from http://www.merriamwebster.com/dictionary/word-association%20test Spiteri, S. P. (n.d.). "Word association testing and thesaurus construction." Retrieved November 21,2012, from Dalhousie University, School of Library and Information Studies website: http://libres.curtin.edu.au/libres14n2/Spiteri_final.htm Schultz, D. P., & Schultz, S. E. (2000). "The history of modern psychology." Seventh edition. Harcourt College Publishers. Poizner, Annette (2012). Clinical Graphology: An Interpretive Manual for Mental Health Practitioners. Springfield, Illinois: Charles C Thomas Publishers. Leopold Szondi (1960) Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik. Huber, Bern und Stuttgart, 2nd edition. Ch.27, From the Spanish translation, B)II Las condiciones estadisticas, p.396. Szondi (1960) Das zweite Buch: Lehrbuch der Experimentellen Triebdiagnostik. Huber, Bern und Stuttgart, 2nd edition. Ch.27, From the Spanish translation, B)II Las condiciones estadisticas, p.396 Shatz, Phillip. (n.d.) "Projective personality testing: Psychological testing." Retrieved November 21, 2012, from Staint Joseph's University: Department of Psychology Web site: http://schatz.sju.edu/intro/1001lowfi/personality/projectiveppt/sld001.htm

NOR SAHIDAH BINTI MOHAMAD ALI MPF1243 JB MP121197 880522-01-5516


Meyer, Gregory J. and Kurtz, John E.(2006) 'Advancing Personality Assessment Terminology: Time to Retire "Objective" and "Projective" As Personality Test Descriptors', Journal of Personality Assessment, 87: 3, 223 225

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy