0% found this document useful (0 votes)
20 views17 pages

Item Analysis

Item analysis in multiple choice questions
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views17 pages

Item Analysis

Item analysis in multiple choice questions
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

INTRODUCTION

ITEM ANALYSIS
Item analysis is a process which examines student responses to individual test items (questions)
in order to assess the quality of those items and of the test as a whole. Item analysis is especially
valuable in improving items which will be used again in later tests, but it can also be used to
eliminate ambiguous or misleading items in a single test administration. In addition, item
analysis is valuable for increasing instructors’ skills in test construction, and identifying specific
areas of course content which need greater emphasis or clarity.
Item analysis uses statistics and expert judgment to evaluate tests based on the quality of
individual items, item sets, and entire sets of items, as well as the relationship of each item to
other items. It “investigates the performance of items considered individually either in relation to
some external criterion or in relation to the remaining items on the test” (Thompson & Levitov,
2000, p. 163). It uses this information to improve item and test quality. Item analysis concepts
are similar for norm-referenced and criterion-referenced tests, but they differ in specific,
significant ways.

Item analysis refers to a statistical technique that helps instructors identify the effectiveness of
their test items. In developing quality assessment and specifically effective multiple-choice test
items, item analysis plays an important role in contributing to the fairness of the test along with
identifying content areas that maybe problematic for students.
Generally, the process of item analysis works best when class sizes exceed 50 students. In such
cases, item analysis can help in identifying potential mistakes in scoring, ambiguous items, and
alternatives (distractors) that don’t work. When performing item analysis, we are analyzing the
following important statistical information:

1) Item Difficulty Power / facility index


2) Item Discrimination Power
3) Item Distractor Power

1
ADVANTAGES OF ITEM ANALYSIS

1) It leads to the improvement of individual test items. The analysis of each of the items will
enable the test constructor/users know the effectiveness of each item. Item analysis
provides diagnostic information for determining the quality of the items.
2) . In view of Crobbach (1990: 178), statistical analysis of items spots questionable items.
When these items are reviewed or rewritten they increase the validity of the test.
3) Item analysis makes it possible to shorten a test and at the same time increase its validity
and reliability. This is achieved because it helps us to choose items of suitable difficulty
level.
4) Item analysis lead to increased skill in test construction. Item analysis reveals ambiguity,
clues, ineffective distracters, and other technical defects that were missed during the
preparation of the test.

STEPS TO ITEM ANALYSIS


Gronhand (1976: 264-265) identified the following steps as procedure for item analysis.
1) Prepare a table of matrix to rank the scorers.
2) Rank the scores in order from the highest to the lowest score
3) Select the 27% of individual (testees) with the highest scores and the 27% of the
testees with lowest scores.
4) For each item of the test, tabulate the number of testees in the upper and lower groups
who select each alternative. This tabulation can be made directly on the test paper or on
the test item card.

5) . Estimate the difficulty of each item (percentage of testees who got the item right in the
upper and lower groups).
6) . Estimate the discriminating power of each item (difference between the number of
testees in the upper and lower groups who got the item right).

7) . Evaluate the effectiveness of the distracters in each item (attractiveness of the


incorrect alternatives).

2
The first three steps of this procedure merely provide a convenient tabulation of testees
responses from which we can readily obtain an estimate of item difficulty, item
discriminating power and the effectiveness of each distracter. This latter information can
frequently be obtained simply by inspecting the item analysis data.

ITEM DIFFICULTY POWER / FACILITY INDEX

This indicates the proportion of students who got the item right. A high percentage indicates an
easy item/question and a low percentage indicates a difficult item. Item difficulty helps to
determine how difficult an item is. In general, items should have values of difficulty no less than
20% (probability of 0.2) and no greater than 80% ( probability of 0.8 ). Very difficult or very
easy items contribute little to the discriminating power of a test.

For items with one correct alternative worth a single point, the item difficulty is simply the
percentage of students who answer an item correctly. In this case, it is also equal to the item
mean. The item difficulty index ranges from 0 to 100; the higher the value, the easier the
question. When an alternative is worth other than a single point, or when there is more than one
correct alternative per question, the item difficulty is the average score on that item divided by
the highest number of points for any one alternative. Item difficulty is relevant for determining
whether students have learned the concept being tested. It also plays an important role in the
ability of an item to discriminate between students who know the tested material and those who
do not. The item will have low discrimination if it is so difficult that almost everyone gets it
wrong or guesses, or so easy that almost everyone gets it right.
Item difficulty is the percentage of people who answer an item correctly. It is the relative
frequency with which examinees choose the correct response (Thorndike, Cunningham,
Thorndike, & Hagen,2004). It has an index ranging from a low of 0 to a high of +1.00. Higher
difficulty indexes indicate easier items. An item answered correctly by 75% of the examinees has
an item difficult level of .75. An item answered correctly by 35% of the examinees has an
item difficulty level of .35. Item difficulty is a characteristic of the item and the sample that takes
the test.

3
Item difficulty is calculated by using the following formula (Crocker & Algina, 2000).

Difficulty = # who answered an item correctly X 100

Total # tested

A table of Matrix is needed to begin item analysis. It is a two dimensional table, which has on
one dimension the students as arranged from the highest scorer to the lowest scorer on the left
hand side of the row and the number of items represented on the horizontal column of the table
of matrix.

Student/Items Item1 Item2 Item3 ………… Item50

Student1 x ………….

Student2 X ………. x
Student3 X ………….. x
Student4 x X …………..
….. X …………… x
Student40 X ……………

Going student by student, tick each student’s answers into the cells of the chart. However, enter
only the wrong answers ( x ). Any empty cell will therefore signal a correct answer.

To find item difficulty for each item you count for example in item1 the total number of those
that got the item correct divided by total number of students that took the test multiplied by
100% to get the percentage.

Item difficulty power = total number of those that got the item correct ÷ total number of those
that took the test multiplied by 100%. An acceptable range is between 20% (0.2) and 80% (0.8).

4
Another way of determining item difficulty from the table of matrix is to identify the upper 10
scorers and lowest 10 scorers on the test. Set aside the remainder.

Go back to the upper 10 students. Count how many of them got Item 1 correct (this would be all
the empty cells). Write that number at the bottom of the column for those 10. Do the same for the
other items. We will call these sums RU, where U stands for "upper."

Repeat the process for the 10 lowest students. Write those sums under their 20 columns. We will
call these RL, where L stands for "lower."

Difficulty index is just the proportion of people who passed the item. Calculate it for each
item by adding the number correct in the top group (RU) to the number correct in the bottom
group (RL) and then dividing this sum by the total number of students in the top and bottom
groups (20).

R U + RL
20

Your result will now be compared to the difficulty index of 0.2 and 0.8 to determine how
difficult or easy the item is. But we cannot only analyze items with item difficulty alone. This is
where item discrimination power comes in.

ITEM DISCRIMINATION POWER


Item discrimination refers to the ability of an item to differentiate among students on the basis of
how well they know the material being tested. Various hand calculation procedures have
traditionally been used to compare item responses to total test scores using high and low scoring
groups of students. Computerized analyses provide more accurate assessment of the
discrimination power of items because they take into account responses of all students rather
than just high and low scoring groups.
Discrimination index is the difference between the proportion of the top scorers who got an
item correct and the proportion of the bottom scorers who got the item right (each of these

5
groups consists of twenty seven percent (27%) of the total group of students who took the test
and is based on the students’ total score for the test). The discrimination index range is between
-1 and +1. The closer the index is to +1, the more effectively the item distinguishes between the
two groups of students. Sometimes an item will discriminate negatively. Such an item should be
revised and eliminated from scoring as it indicates that the lower performing students actually
selected the key or correct response more frequently than the top performers.

Discrimination power is concerned with the establishment of how able the correct option attract
only those who know and fail to attract those who do not know. In computing discrimination
power the participants are categorized into three using a 27% margin, namely high scorers,
moderate scorers and the low scorers.
Using the table of matrix for instance forty participants, the categories will be 27% of 40.
27% of 40 = 10.8
This will be rounded up to 11 and move to the table of matrix and count the first 11 and move to
the bottom and count the last 11. This will represent the upper scoring group and the lower
scoring group respectively since they were initially arranged from the highest scorers to the
lowest scorers. In this analysis we don’t bother about the middle scorers.
For discrimination power we are to analyze the correct options. Using the following;
Ru = number of those in the high scoring group that got the item correct.
RL = number of those in the low scoring group that got the item correct.
Nu = number of those in the high scoring group.
NL = number of those in the low scoring group

Discrimination Power = R u - RL
( Nu + NL ) ÷ 2

The 27 percent is used because “this value will maximize differences in normal
distributions while providing enough cases for analysis” (Wiersma & Jurs, 2001, p. 145).
Comparing the upper and lower groups promotes stability by maximizing differences between
the two groups. The percentage of individuals included in the highest and lowest groups can

6
vary. Nunnally (2005) suggested 25 percent, while SPSS (2000) uses the highest and lowest one-
third.
Wood (2000) stated that ‘When more students in the lower group than in the upper group select
the right answer to an item, the item actually has negative validity. Assuming that the criterion
itself has validity, the item is not only useless but is actually serving to decrease the validity of
the test.

The higher the discrimination index, the better the item because high values indicate that the item
discriminates in favor of the upper group which should answer more items correctly. If more low
scorers answer an item correctly, it will have a negative value and is probably flawed.

A negative discrimination index occurs for items that are too hard or poorly written, which
makes it difficult to select the correct answer. On these items poor students may guess correctly,
while good students, suspecting that a question is too easy, may answer incorrectly by reading
too much into the question. Good items have a discrimination index of .40 and higher;
reasonably good items from .30 to .39; marginal items from .20 to .29, and poor items
less than .20 (Ebel & Frisbie, 2002).

ITEM DISTRACTOR POWER


Distractor evaluation is another useful step in reviewing the effectiveness of a test item. All of
the incorrect options, or distractors, should actually be distracting. Preferably, each distracter
should be selected by a greater proportion of the lower scorers than of the top group. In order for
a distractor to be acceptable it should attract at least one candidate. If no one selects a distractor
it is important to revise the option and attempt to make the distractor a more plausible choice.
Distractor analysis is an extension of item analysis, using techniques that are similar to item
difficulty and item discrimination. In distractor analysis, however, we are no longer interested in
how test takers select the correct answer, but how the distracters were able to function effectively
by drawing the test takers away from the correct answer. The number of times each distractor is
selected is noted in order to determine the effectiveness of the distractor. We would expect that
the distractor is selected by enough candidates for it to be a viable distractor. What exactly is an
acceptable value? This depends to a large extent on the difficulty of the item itself and what we

7
consider to be an acceptable item difficulty value for test times. If we are to assume that 0.7 is an
appropriate item difficulty value, then we should expect that the remaining 0.3 be about evenly
distributed among the distractors. Let us take the following test item as an example:

In the story, he was unhappy because…………

A. it rained all day


B. he was scolded
C. he hurt himself
D. the weather was hot

Let us assume that 100 students took the test. If we assume that A is the answer and the item
difficulty is 0.7, then 70 students answered correctly. What about the remaining 30 students and
the effectiveness of the three distractors? If all 30 selected D, the distractors B and C are useless
in their role as distractors. Similarly, if 15 students selected D and another 15 selected B,
then C is not an effective distractor and should be replaced. The ideal situation would be for each
of the three distractors to be selected by 10 students. Therefore, for an item which has an item
difficulty of 0.7, the ideal effectiveness of each distractor can be quantified as 10/100 or 0.1.
What would be the ideal value for distractors in a four option multiple choice item when the item
difficulty of the item is 0.4? Hint: You need to identify the proportion of students who did not
select the correct option.
From a different perspective, the item discrimination formula can also be used in distractor
analysis. The concept of upper groups and lower groups would still remain, but the analysis and
expectation would differ slightly from the regular item discrimination that we have looked at
earlier. Instead of expecting a positive value, we should logically expect a negative value as
more students from the lower group should select distracters. Each distractor can have its own
item discrimination value in order to analyse how the distracters work and ultimately refine the
effectiveness of the test item itself. If we use the above item as an example, the item
discrimination concept can be used to assess the effectiveness of each distractor. If a class has
100 students, we can form upper and lower groups of 30 students each. Assume the following are
observed:

8
No. of upper No. of lower
Distractor group students group students Discrimination Value/index
who selected who selected
A. It rained
all day 20 10 (20 – 10) /30 0.33
B. He was
scolded 3 3 (3 – 3) /30 0
C. He hurt
himself 4 16 (4 – 16) /30 -0.4
D. The
weather 3 1 (3 – 1) /30 0.07
was hot

The values in the last column of the table can once again be interpreted according to how we

examined item discrimination values, but with a twist. Alternative A is the key and a positive

value is the value that we would want. However, the value of 0.33 is rather low considering the

maximum value is 1. The value for distractor B is 0 and this tells us that the distractor did not

discriminate between the proficient students in the upper group and the weaker students in the

lower group. Hence, the effectiveness of this distractor is questionable. Distractor C, on the other

hand, seems to have functioned effectively. More students in the lower group than in the upper

group selected this distractor. As our intention in distractor analysis is to identify distractors that

would seem to be the correct answer to weaker students, then distractor C seems to have done its

job. The same cannot be said of the final distractor. In fact, the positive value obtained here

indicates that more of the proficient students selected this distractor. We should understand by

now that this is not what we would hope for.

9
Distractor analysis can be a useful tool in evaluating the effectiveness of our distractors. It is

important for us to be mindful of the distractors that we use in a multiple choice format test as

when distractors are not effective, they are virtually useless. As a result, there is a greater

possibility that students will be able to select the correct answer by guessing as the options have

been reduced.

OTHER ITEM ANALYSIS STATISTICS


POINT BISERIAL
Point biserial is the correlation between an individual student's performance on an item and his
or her total score on the test. The values range from -1 to +1. The high positive values are
desirable for the correct answer because they indicate that a student who did well on the exam
also did well on this question. Negative values are desirable for the alternatives or distractors that
were not the correct answer. A score of 0 or less, for the correct alternative, indicates the
question has difficulty distinguishing between those students who know the material and those
who do not. The question should be examined and revised and potentially eliminated from
scoring.

Some test analysts may desire more complex item statistics. Two correlations which are
commonly used as indicators of item discrimination are shown on the item analysis report. The
first is the biserial correlation, which is the correlation between a student's performance on an
item (right or wrong) and his or her total score on the test. This correlation assumes that the
distribution of test scores is normal and that there is a normal distribution underlying the
right/wrong dichotomy. The biserial correlation has the characteristic, disconcerting to some, of
having maximum values greater than unity. There is no exact test for the statistical significance
of the biserial correlation coefficient.

The point biserial correlation is also a correlation between student performance on an item (right
or wrong) and test score. It assumes that the test score distribution is normal and that the division
on item performance is a natural dichotomy. The possible range of values for the point biserial
correlation is +1 to -1. The Student's t test for the statistical significance of the point biserial
correlation is given on the item analysis report. Enter a table of Student's t values with N - 2
degrees of freedom at the desired percentile point N, in this case, is the total number of students
appearing in the item analysis.

10
The mean scores for students who got an item right and for those who got it wrong are also
shown. These values are used in computing the biserial and point biserial coefficients of
correlation and are not generally used as item analysis statistics.

A CAUTION IN INTERPRETING ITEM ANALYSIS RESULT


. W. A. Mehrens and I. J. Lehmann provide the following set of cautions in using item analysis
results (Measurement and Evaluation in Education and Psychology. New York: Holt, Rinehart
and Winston, 2000, 333-334):

 Item analysis data are not synonymous with item validity. An external criterion is
required to accurately judge the validity of test items. By using the internal criterion of
total test score, item analyses reflect internal consistency of items rather than validity.
 The discrimination index is not always a measure of item quality. There is a variety of
reasons an item may have low discriminating power:(a) extremely difficult or easy items
will have low ability to discriminate but such items are often needed to adequately
sample course content and objectives;(b) an item may show low discrimination if the test
measures many different content areas and cognitive skills. For example, if the majority
of the test measures “knowledge of facts,” then an item assessing “ability to apply
principles” may have a low correlation with total test score, yet both types of items are
needed to measure attainment of course objectives.
 Item analysis data are tentative. Such data are influenced by the type and number of
students being tested, instructional procedures employed, and chance errors. If repeated
use of items is possible, statistics should be recorded for each administration of each
item.

LIMITATIONS OF ITEM ANALYSIS

Item analysis is a completely futile process unless the results help instructors improve their
classroom practices and item writers improve their tests. Let us suggest a number of points of
departure in the application of item analysis data.

1) It is not commonly used in the analysis of essay items.


2) It is not used for supplied type of objective test.

11
3) It is only used when the test involves large population of students.
4) It requires the preparation of large number of test items.
5) Generally, item statistics will be somewhat unstable for small groups of students. Perhaps
fifty students might be considered a minimum number if item statistics are to be stable.
Note that for a group of fifty students, the upper and lower groups would contain only
thirteen students each. The stability of item analysis results will improve as the group of
students is increased to one hundred or more. An item analysis for very small groups
must not be considered a stable indication of the performance of a set of items.

CONCLUSION
Item analysis is the process of testing items to ascertain specifically whether the item is
functioning properly in measuring what the entire item is measuring. It begins after the test has
been administered and scored. It also involves detailed and systematic examination of the
testees’ response to each item to determine the difficulty level and discriminating power of the
item.
This also includes determining the effectiveness of each option. The decision on the quality of an
item depends on the purpose for which the test was designed. However, for an item to effectively
measure what the entire test is measuring and provide valid and useful information, it should not
be too easy or difficult.

12
REFERENCE
Holt, Rinehart and Winston, 2003: Measurement and Evaluation in Education and Psychology.
New York
Wood, D. A. (2000). Test construction: Development and interpretation of achievement tests.
Columbus, OH: Charles E. Merrill Books, Inc.

Thorndike, R. M., Cunningham, G. K., Thorndike, R. L., & Hagen,E. P. (2004). Measurement
and evaluation in psychology and education (5th Ed.). New York: MacMillan.

SPSS. (2000). Item analysis. spss.com. Chicago: Statistical Package for the Social Sciences.

Educational Testing Service. (2005, August 10). What’s the DIF? Helping to ensure test question
fairness. research@ets.org. Princeton, NJ: The Educational Testing Service

Nunnally, J. C. (2005). Educational measurement and evaluation (4th Ed.). New York: McGraw-
Hill.

Wiersma, W. & Jurs, S. G. (2001). Educational measurement and testing (2nd Ed.). Boston,
MA: Allyn and Bacon.

Ebel, R. L., & Frisbie, D. A. (2002). Essentials of educational measurement. Englewood Cliffs,
NJ: Prentice-Hall.
Adedokun J. A. (2012). Educational measurement, assessment, evaluation and statistics, New
hope educational publishers, Lagos.

13
ATTITUDE SCALE QUESTIONARE
SCHOOL………………………………………. LEVEL ……………………………………..
GENDER…………………………FACULTY/DEPARTMENT…………………………
TITLE: ATTITUDE TO ABORTION

10 POSITIVE STATEMENTS
1) ………………………………………………………………………………............
……………………………………………………………………………………….
2) ……………………………………………………………………………………….
…………………………………………………………………………………..........
3) ………………………………………………………………………………………….
…………………………………………………………………………………………
4) …………………………………………………………………………………………
…………………………………………………………………………………………
5) …………………………………………………………………………………………
…………………………………………………………………………………………
6) …………………………………………………………………………………………
…………………………………………………………………………………………..
7) ……………………………………………………………………………………………

…………………………………………………………………………………………….
8) …………………………………………………………………………………………….

…………………………………………………………………………………………….
9) ……………………………………………………………………………………………

……………………………………………………………………………………………..
10) ………………………………………………………………………………………………

14
………………………………………………………………………………………………

10 NEGATIVE STATEMENTS
1) ………………………………………………………………………………............
……………………………………………………………………………………….
2) ……………………………………………………………………………………….
…………………………………………………………………………………..........
3) ………………………………………………………………………………………….
…………………………………………………………………………………………
4) …………………………………………………………………………………………
…………………………………………………………………………………………
5) …………………………………………………………………………………………
…………………………………………………………………………………………
6) …………………………………………………………………………………………
…………………………………………………………………………………………..
7) ……………………………………………………………………………………………

…………………………………………………………………………………………….
8) …………………………………………………………………………………………….

…………………………………………………………………………………………….
9) ……………………………………………………………………………………………

……………………………………………………………………………………………..
10) ………………………………………………………………………………………………

………………………………………………………………………………………………

15
UNIVERSITY OF LAGOS
FACULTY OF EDUCATION
SCHOOL OF POSTGRADUATE STUDIES

COURSE TITLE:
ADVANCE MEASUREMENT AND EVALUATION

COURSE CODE
EDF 809

NAME MATRIC NO: COHORT


MEASUREMENT AND
ODUH CHIBUEZE GODFREY 159034016 EVALUATION

MEASUREMENT AND
IJIYEMI OLUWASEUN 100310046 EVALUATION
MARGARET
EDUCATIONAL
OLAMIJU KIKELOMO 089034085 PSYCHOLOGY
PRECIOUS

LECTURER: Dr. Mrs. O. M. ALADE

GROUP SIX (6) ( REGULAR)

TOPIC: ITEM ANALYSIS

SESSION: 2015/2016

16
ATTITUDE TO ABORTION
Please kindly tick as appropriate.
AGE.. 18 to 25 ( ), 26 to 30 ( ), 31 to 35 ( ), 36 to 40 ( ), 40 and above ( ).
SA (strongly agree), A (agree), U ( undecided ), D ( disagree), SD ( strongly disagree).

s/n statements SA A U D SD
1 Abortion helps to reduce population
2 It promotes illicit sexual activities in young people.
3 It helps to save the life of medically at risk mother.
4 It helps to role away shame for pregnant young people.
5 It serves as ends meet to some doctors.
6 It can lead to severe damage of the womb.
7 It helps ladies to pursue happiness.
8 It prevents complications arising from pregnancy.
9 It helps to check family size.
10 In the event of rape, it helps to remove unwanted child.
11 It reduces the number of children proposed by couples.
12 It can cause the death of the woman.
13 It can lead to barrenness when the womb is damaged.
14 In some religious beliefs, it is a sin.
15 It can lead to low self esteem of an individual.
16 It can cause infections in the body.
17 It prevent the conception of bastard children.
18 It helps to prevent disgrace.
19 Abortion is dangerous to human life.
20 Abortion can be termed brutality.
21 Abortion is capital intensive.
22 Abortion breaks marriages, relationships and homes.
23 Abortion is a murderous act.
24 Abortion save guards women’s health.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy