0% found this document useful (0 votes)
58 views26 pages

(6507) Educational Measurement and Evaluation New

1. Personality inventories are standardized questionnaires that measure traits like social skills, motivations, strengths, and attitudes. They are self-assessment tools used in career counseling to help people understand their personality type and what careers may be a good fit. 2. Some examples of commonly used personality inventories described in the document are the Cornell Index, Bell Adjustment Inventory, Allport A-S Inventory, Bernreuter Inventory, and Minnesota Multiphasic Personality Inventory. These inventories measure traits like neuroticism, emotional stability, ascendance, sociability, and more. 3. The results of a personality inventory can help an individual learn about themselves and explore careers that align with their personality

Uploaded by

Maryam Mobin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views26 pages

(6507) Educational Measurement and Evaluation New

1. Personality inventories are standardized questionnaires that measure traits like social skills, motivations, strengths, and attitudes. They are self-assessment tools used in career counseling to help people understand their personality type and what careers may be a good fit. 2. Some examples of commonly used personality inventories described in the document are the Cornell Index, Bell Adjustment Inventory, Allport A-S Inventory, Bernreuter Inventory, and Minnesota Multiphasic Personality Inventory. These inventories measure traits like neuroticism, emotional stability, ascendance, sociability, and more. 3. The results of a personality inventory can help an individual learn about themselves and explore careers that align with their personality

Uploaded by

Maryam Mobin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Q#01

What techniques are useful for measuring behaviours? Why personality


inventories were developed? Explain their utility

Measurement

It is the process of applying quantitative labels to observed properties of events using a


standard set of rules

Measuring Behaviour
At the Measuring Behavior conference, researchers present methods and techniques to
get insights into pig behavior in a stable, teacher behavior in a classroom, visitor behavior in a
museum, crab behavior in an open field, citizen behavior in a smart city, consumer behavior in a
store, and much more. We haven't had time to follow all five behavior research tracks but here
are a few highlights of the first conference day.

Some of the important indicators/techniques which are employed to measure verbal and
non-verbal behaviour are mentioned below:

1. Response Time or Latency:


One factor which is normally employed to measure behaviour is the time taken for an
individual to produce a response. A classic example of this is the reaction-time experiment.

2. Duration of Response:
Another factor which is taken into consideration for measuring behaviour is the duration
of time for which a particular behaviour of response occurs. Measurements of after images and
such other sensory experiences are subjected to this type of indices. Suppose you look at a bright
green light.

The experienced greenness may remain for a moment even after you cease looking at the
light. Similarly, when you hear a loud sound prolonged for some time or inhale a strong perfume
for a long time the sound and perfume will remain for some time even after these stimuli are
withdrawn.

3. Time Taken for a Response to be completed:


This measure is used very widely in measuring learning, intelligence and other abilities.
For example, in Skinner’s learning experiment or Thorndike’s trial-and-error learning

1
experiment, one of the criteria employed to measure whether the rat or cat has learnt the correct
path is in terms of the time taken by the animal to reach its goal.

4. Frequency of Response:
The number of times a particular response occurs within a given time or on a particular
occasion is another indicator. An example of this type can be seen in the measurement of
fluctuation of attention. Experiments on fluctuation of attention employ, as an index, the number
of times attention shifts from one aspect of a given stimulus to another within a stipulated time
limit.

5. Amount of Response:
In measuring emotional behaviour the amount or intensity of glandular and muscular
responses is employed as an indicator. If a person’s aggression has to be measured, then the
experimenter may try to measure the subject’s blood pressure, rate of respiration, rate of
heartbeat, gestures, tone, facial and other expressions accompanied by certain psychological
changes. Only after analyzing and combining a variety of such data does one may arrive at the
measure indicating overall reaction of aggression or total amount of aggressive reaction.

6. Number of Trials Required:


Yet, another indicator used is the number of trials, practice attempts or presentations of a
certain stimulus. This is very commonly used in experiments on learning. In most of the learning
experiments the number of attempts required by an organism to learn a task to a standard or
criterion is used as an index. Similarly, experiments on remembering and learning also employ a
number of presentations or trials required for a person to learn verbal material to the point of
perfect recall.

7. Complexity and Difficulty of Response:


The more complex and difficult a particular response, the higher the score The concept of
mental age is based on this. Some of the items of aptitude tests and intelligence tests are planned
in such a way that the difficulty level is deliberately increased.

For instance, in Binet’s intelligence test the items are arranged in such a manner that the
complexity and difficulty level is increased gradually as the test advances. Thus, we see that
psychologists employ different kinds of measures depending on the nature of the behaviour and
the purpose of the measurement.

2
Personality inventories
Personality inventories, also called objective tests, are standardized and can be
administered to a number of people at the same time. A psychologist need not be present when
the test is given, and the answers can usually be scored by a computer. Scores are obtained by
comparison with norms for each category on the test. A personality inventory may measure one
factor, such as anxiety level, or it may measure a number of different personality traits at the
same time, such as the Sixteen Personality Factor Questionnaire (16 PF).

Explanation
A personality inventory is a self-assessment tool that career counselors and other career
development professionals use to help people learn about their personality types. It reveals
information about individuals' social traits, motivations, strengths and weaknesses, and attitudes.
Experts believe these factors play an important role in job and career success and satisfaction.

What Personality Inventories Can Do


 Personality inventories can teach you about yourself, which will help you
learn what occupations and work environments are a good fit.
 In addition to just learning about your personality, to determine whether a
career is right for you it is essential to also consider factors such as
interests, values, and aptitudes.
 A self-assessment, including taking a personality inventory, is just one
step you must take to find the right career. Explore the occupations that
seem to be a good match based on your results. Consider job duties,
earnings, requirements, and occupational outlook to find out if you should
pursue a particular career.

Cornell Index
This was developed during world war-n and the civilian revision of this questionnaire
contains 101 yes-no items pertaining to feelings of fear and inadequacy, depression and other
pathological mood reactions, nervousness and anxiety, etc., and several kinds of psychosomatic
symptoms. The time required to complete the inventory varies from 5 to 15 minutes.

Bell Adjustment Inventory


In this inventory, 35 items are classified Into separate categories and a score is provided
for each category .Two forms of this inventory are available, one for high school and college
students., the other for adults. However, the student form has been employed more widely. This
form has been designed to measure adjustment in four areas: () Home, (2) Health (3) Social. (4)

3
Emotional. An additional score has been provided in the adult form to measure occupational
adjustment. Encircling Yes' or 'No' records answers

Allport A-S Inventory


This is also known as Ascendance Submission Inventory. This inventory seeks to assess
the individual's tendency to dominate his associates or be dominated by them in face-to-face
contacts of everyday life. Each item begins with a brief description of a situation. which we
usually encounter at a meeting, in school on a bus or in other familiar settings. The subject is
asked to indicate one of the two or four alternative ways listed for meeting the situation.
Responses indicate the degree of ascendance or submission. Separate forms of the tests are
available for men and women. This inventory has greatly influenced the development of many
other inventories.

Bernreuter Inventory.
It consists of 125 Hems and is designed to measure six scores 1) Neuroticism. (2) Self-
sufficiency, (3) Introversion. (4) Dominance, (5) Confidence, (6) Sociability. The last two were
added by Flanagan.

This manual provides norms on all six scores for High School College and general adult
population

The Inventory appears to be more effective with normal and near normal than with
psychotics

Haston personal adjustment inventory


This yields scores in Analytical Thinking, Sociability Emotional Stability Confidence
Personal Relations and Home Satisfaction.

Guilford Inventory
The latest inventory in the Guilford series consists of 300 items, 30 for each of the
following 10 scores.

G. General Activity

R. Restraint

A. Ascendance

S. Sociability

E. Emotional Stability

4
0. Objectivity

F. Friendliness

T. Thoughtfulness

Cattell’s Inventory
Cattell has devised the Sixteen Personality Factor Questionnaire. This inventory is
available in two parallel forms, A and B each containing 187 items. The use of both forms is
advocated for greater reliability.

The Minnesota Multiphase Personality Inventory


Published in 1943 consists of 550 items and is used for persons of 16 years of age and
above. Each item is printed on a separate card. The subject has to sort the cards into 3 groups.
True, False, and Cannot Say. The items are classified under 26 heads, such as health, religious
attitudes, delusions, phobias, etc. Items can be grouped in separate scales to score none
personality traits. These are hypochondria sis, depression hysteria, psychopathic deviate,
masculinity feminity, paravoia, psychosthenia, schizophrenia, and hypomania. This inventory is
used in clinical diagnosis. A part form its comprehensiveness, the MMPI is provided with several
control keys meant for identifying untrustworthy responses. These keys give such scores as lie
score (L) when the subject tries to fake good on socially approved behaviour, the K score when
the subject fakes bad to show himself in bad light; the question score when the subject gives a
large number of cannot Say responses, and the F score when the subject marks items with
careless or misunderstanding. The MMPI is one of the several most widely used inventories. A
shortened version consisting of 336 items is also available for emergency use.

The Guilfore-Zimmerman Temperament Survey


It is meant to identify ten different trait dimensions of personality. Some of these are
general activity, friendliness, thoughtfulness, personal relations, and masculinity .The inventory
is used with adolescents and adults. These traits have been included after factor analysis and are
mutually exclusive. The sample used for standardization consisted of normal persons, and not of
maladjusted or neurotics.

The Edward Personal and Preference Schedule


It consists of 210 items, which assess the strength of 15 needs selected from among those
listed by Murray. The items are presented to the subject in pairs and these are more or less
equated for social desirability, so that the subject would respond as he really felt, and not in
terms of what is the approved or desirable thing to say.

5
Evaluation of Personality Inventories
The construction and the use of personality inventories are beset with special difficulties in view
of the following:

1. Complex nature of personality

2. Different definitions of personality

3. Greater specificity of responses in the sphere of personality.

4. Lack of adequate criteria for the determination of empirical validity.

5. Different actions by the same individual in different situations.

The main advantages of personality inventories include understanding the candidate


better, an impartial recruitment process, reduced time-to-hire, improved ROI, identifying dark
personality traits, and a greater probability of landing the best-fit candidate.

Q#02

How the length and item difficulty of a test can influence the appropriate
assessment of students? Explain other considerations that help to develop
appropriate test items

Test length
The length of a test is also an important factor in obtaining a representative sample. Test
length is determined when the set of specifications is built and depend on such factors as the
purpose of testing, the types of test items used, the age of the pupils, and the level of reliability
needed for effective test use. Thus, a criterion-referenced mastery test over third-grade social
studies until night contain 30 objective items, whereas a norm-referenced survey test over a
tenth-grade social studies course might contain more than 100 objectives items and several essay
questions. Although there are no hard and fast rules for determining test length, an important
consideration from a sampling standpoint is the number of test items devoted to each specific
area being measured. We want our classroom test to be long enough t provide an adequate
sampling of each objective and each content area as a rule of thumb. it is desirable when
constructing a criterion-referenced mastery test to use at least ten objective test items to measure
each specific learning outcome. This number however, might be lowered to as few as five if the
task is extremely limited e.g. "Adds two single-digit numbers. Capitalizes proper names and the
pupils are to supply the answers rather than to selection them. For a norm-referenced test, where
the sample of test items typically covers a board area and emphasis is on the total score, using
several objective test items for each specific learning outcome and ten more for each general
objective would probably be sufficient.
6
Special problem of sampling arise when complex learning outcomes are being measured,
because here we must turn to more elaborate objective-type Items and essay question. Both items
types require considerable testing time, but a single test exercise is still inadequate for measuring
intended outcome. One exercise calling for the interpretation graphs, the nature of the data or the
type of graph may be the most influential factor in determining whether it is interpreted properly.
When several graphs are used the effect of such factors is minimized, and we obtain a more
representative sample of the ability to inter prêt graphs. A similar situation occurs with the use of
essay questions. The answer to any single question depends too heavily on the particular sample
of information called for by the question and thus the only feasible is to confine each test of
complex outcomes to a rather limited area e.g. graph interpretation, problem solving )and to test
more often. In any event our aim should be to obtain as representative a sample of pupil
performance as possible in each area to be tested other things being equal, the greater the number
of test items the greater the likelihood of an adequate sample and thus the more reliable the
results.

Proper Item Difficulty


The difficulty of items to be included in a classroom in a test depends largely on whether
the test is being designed to describe the specific learning tasks the students can perform(e.g.
CRT) or rank the students in order of their achievement (e.g. NRT) Item Difficulty and Criterion
Referenced Testing. The difficulty of test item in a criterion referenced mastery test is
determined by the nature of the specific learning tasks which to be measured. If the learning
tasks are easy, the test items should be easy. If the learning tasks are modify item difficult, the
test items should be moderately difficult. No attempt should be made to moderately difficult, the
test item difficulty or to eliminate easy items from the test in order to obtain a range of test
scores. On a criterion referenced mastery we would expect all, or nearly all student to obtain high
scores when the instruction have been effective Special care should be taken into account e.g.
avoid irrelevant barriers (ambiguity) to the answers, unintended clues to the correct response, or
any other factor that might alter the level of difficulty of the test task.

For criterion referenced test at the developmental level of learning. we need test item of
varying difficulty for each instructional objectives. Ideally, the difficulty of the test tasks would
be derived directly from the instructional content, rather than form some arbitrary attempt to
manipulate item difficulty. It should also be kept in mind that in criterion referenced mastery
testing, a wide range of scores is expected.

Item Difficulty and Norm Referenced Testing. Because norm referenced tests are
designed to rank students in order of achievement. deliberate attempts are made to obtain a wide
spread of scores. That is why easy items are eliminated to maximize the differences in students’
performance. Maximum differentiation among students in terms of achievement is obtained
when the average score is near the midpoint of the possible scores, and the scores range from

7
near zero to near perfect. The average difficulty to try for on a 100-item test for various choice
type items would be as follows:

Chance Score Average Difficulty


Two Choice Items (e.g., True- 50 75
false
Three-Choice multiple-choice 33 67
items
Four-Choice multiple-choice 25 63
items
Five-Choice multiple-choice 20 60
items

One way of ensuring that tests have a desirable influence on pupil learning is
to pay particular attend on to the breadth of content and learning outcomes
measured by the tests
When we select a representative sample of content from all of the areas covered in our
instruction, we are emphasizing to our pupils that they must devote attention to all areas: they
cannot neglect some aspects of the course and do well on the tests. Similarly, when our tests
measure a variety of types of learning outcomes, the pupils Soon learn that a mass of memorized
facts, develop conceptual understandings, draw conclusions, recognize assumptions, identify
cause-and-effect relations, and the like. This discourages them from depending solely on
memorization as a basis for learning and encourages them to use more complex mental
processes.

The practice of constructing tests that measure a variety of learning outcomes


should also lead to improved teaching procedures and, thus, indirectly to
improved pupil learning
As we translate the various learning outcomes into test items, we develop a better notion
of the mental processes involved. Thus, the function of understanding thinking skills and other
complex learning outcomes becomes clearer to us. This clarification of how achievement is
reflected in mental processes enables us to plan more effectively the pupils learning experiences.
Furthermore, we also more apt to emphasize understanding thinking skills, and other complex
learning outcomes in our teaching when we include them in Our testing This may seem a case of
the cart pulling the horse but a well-constructed test frequently leads to a review of teaching
procedures and to the abandonment of those that encourage rote learning

8
Finally, a test will contribute to improved teacher-pupil relations (with a
beneficial effect on pupil learning if pupils view the test as fair and useful
measure of their achievement
We can make fairness apparent by including a representative sample of the learning tasks
that have been emphasized during instruction, by writing concise directions, by making certain
that the intent of each test item is clear and free of any bias that would prevent a knowledgeable
person from answering correctly, and by providing adequate time limits for the test. The pupils'
recognition of usefulness however depends as much on what we do with the results of the test as
on the characteristics of the test itself. We can make the usefulness apparent by using the result
as a basis for guiding and improving learning.

Test Planning Consideration


This will discuss those factors concerned with planning classroom which includes the following:

 Determing the purpose of testing


 Developing the test specification
 Selecting appropriate item types
 Preparing relevant test items

The Purpose of Classroom Testing Classroom tests can be used for a variety of instructional
purposes. However, the various uses of tests and other evaluation instruments can be classified
into four types of classroom evaluation:

1. Placement evaluation,

2. Formative evaluation,

3. Diagnostic evaluation, and

4. Summative evaluation.

Because teacher-made tests are useful in all four areas, this classification system provides a
convenient basis for considering the role of test purpose in planning the classroom test

Placement Testing
Most placement tests constructed by classroom teachers are pretests designed to measure

1. Whether pupils possess the prerequisite skills needed to Succeed in a unit or


course or

9
2 To what extent pupils have already achieved the objectives of the planned
instruction.

In the first instance we are concerned with the pupils readiness to begin the instruction. In
the second we are concerned with the appropriateness of our planned instruction for the group
and with proper placement of each pupil in the instructional sequence. Pretests for determining
prerequisite skills are typically rather limited in scope. For example a pretest in Algebra might be
confined to computational skill in arithmetic; a pretest in science might consist solely of science
terms; and a pretest in beginning

German might he limited to knowledge of English grammar. In addition to being


confined to a small area of knowledge or skill, the readiness pretest also tends to have a
relatively low level of difficulty This is because this type of pretest is used to determine whether
pupils have the minimum essentials needed to proceed with the course or unit of work. Pretests
of this type are typically criterion-referenced tests (.e. tests designed to describe the learning
tasks pupils can perform), because their major function is to identify the presence or absence of
prerequisite skills.

Formative Testing
Formative tests are given periodically during instruction to monitor pupils learning
progress and to provide ongoing feedback to pupils and teacher. Formative testing reinforces
successful learning and reveals learning weaknesses in need of correction. A formative test
typically covers some predefined segment passes a rather limited sample of learning tasks the
test items may be easy or difficult, depending on the learning tasks in the segment of instruction
being tested. Formative tests are typically criterion- referenced mastery tests, but norm-
referenced survey tests can also serve this function. Ideally the test will be constructed in such a
way that corrective prescriptions can be given for missed test items or sets of test items. Because
the main purpose of the test is to improve learning, the results are seldom used for assigning
grades.

Diagnostic Testing
Diagnosis of persistent learning difficulties involves much more than diagnostic testing.
But such tests are useful in the total process. The diagnostic test takes up where the formative
test leaves off if pupils do not respond to the feedback-corrective prescriptions of formative
testing, a more detailed search for the source of learning errors will be indicated. For this type of
testing, we will need to include a number of test items in each specific area, with some slight
variation from item to item. In diagnosing pupils' difficulties in adding whole numbers, for
example we would want to include addition problems containing various number combinations,
with some not requiring carrying and some requiring carrying, to pinpoint the specific types of
error each pupil is making. Because our focus is on the pupils' learning difficulties, diagnostic

10
tests must be constructed in accordance with the most common sources of error that pupils
encounter. Such tests are typically confined to a limited area of instruction, and the test items
tend to have a relatively low level of difficulty.

Summative Testing
The summative test is given at the end of a course or unit of instruction, and the results
are used primarily for assigning grades or certifying pupil mastery of the instructional objectives.
The results can also be used, for evaluating the effectiveness of the instruction. The end-of-
course test (final examination) is typically a norm-referenced survey test that is broad in
coverage and includes test items with a wide range of difficulty. The more restricted end- of-unit
summative test might be norm referenced or criterion referenced, depending on whether mastery
or developmental outcomes are the focus of instruction.

Development of Test Specifications


The only assurance we have that a classroom test validly measures the instructional
objectives and course content, we are interested in testing, is to use some systematic procedure
for obtaining a representative sample of pupil performance in each of the areas to be measured.
One device that has been widely used for this purpose is the two-way chart, called a table of
specifications. This chart relates the instructional objectives to the course content and specifies
the relative emphasis to be given to each type of learning outcome.

Building a Table of Specifications

A table of specification in includes:

1. Obtaining a list of instructional objectives,


2. Outlining the course content, and
3. Preparing the two-way chart

Obtaining a list of Instructional Objectives


Many aspects of pupil performance can be measured by means of paper- and-pencil tests.
Infect a novice in the area of measurement is frequently surprised at the variety of learning
outcomes that can be measured in this manner. Thus, all of the intended outcomes of instruction
should be considered when planning the classroom test. If a comprehensive list of instructional
objectives and specific learning outcomes has been prepared, it is simply a matter of selecting
those outcomes that can be measured by paper-and- pencil tests. If such a list is not available, a
set of instructional objectives can be prepared for the classroom test.

11
Outlining the Course Content
The list of instructional objectives describes the types of performance the pupils are
expected o demonstrate (e-g. knows understands, applies) and the course content indicates the
area in which each type of performance is to be shown. Thus the second step in preparing the test
specifications is to outline the course content. This maybe simply a list of major topics to be
covered during the course or a more detailed list of topics and subtopics The amount of detail in
the content outline depends on the purpose of the test the segment of the course covered, and the
type of test interpretation to be used. A criterion-referenced test (used to describe the learning
tasks pupils can perform). for example, will require a much more detailed description of both
objectives and content than will norm-referenced test used to rank pupils in order of
achievement. 111

Preparing the Tow-Way Chart


The final step in building a table of specifications is to prepare the two-way chart that
relates the instructional objectives to the course content and, thus specifies the nature of the test
sample. The relative emphasis to be given to each instructional objective and content area should
reflect the emphasis of the instruction. In assigning relative weights, both the importance the
teacher attaches to the learning outcome and the amount of instructional time devoted to it can
serve as guidelines.

Q#03

Explain the principles of appropriate marking. Highlight the challenges for a


teacher to use these principles in classroom testing.

Basic principles of good marking system


If instructional objectives have been clearly defined in performance terms and evaluation
procedures have been effectively applied, the task of reporting pupl's progress will be greatly
simplified. It is still a rather perplexing one, however, as the evaluative data usually must be
summarized into a single letter grade or, at best, a very brief form. The process is highly
subjective on, for which there are relatively few helpful guidelines. This has led to the use of
marks and progress reports which vary widely in composition meaning

The greatest confusion arises when pupil progress is summarized as a sirngle letter grade
(e.g. A.B.C.D.E.F. It can be explained ass under:

1. Should the assigned mark represent level of achievement, gain in achievement,


or some combination of the two?

12
2. Should effort be included? Or should high achievers be given good narks
regardless of their effort?

3. Should pupils be marked according to their own potential learning ability or in


relation to their classmate’s achievements

There are no simple answers to such questions. Practice varies from school to school, and
frequently from teacher to teacher with the same school system. Therefore. it is better to point
out basic principles of good marking system.

According to Chand 1990 basic principles of a good marking System are as under

1. A marking system should be clear and definite, which can easily comprehended
by the pupils, teachers and parents.

2. A marking system should be realistic, reasonable, and as true to human life


patterns as possible

3. A marking system should provide suficient range of grades, so that, various


degrees of attainment can be Indicated reliably

4. A marking should be based on objective measures or standards that can be


checked objectively or rated consistently with high degree of reliability

5. A marking should utilize statistical procedure in converting Scores into grades.

6. A marking system must be used a means to an end and not as an end in itself.

When a teacher is aware of the basic principles of a good marking System then the next
step should be the awareness of the teacher about Educational Reporting

Educational Reporting
Educational reporting has been generally defined as the communication of educational
outcomes. When, how, why, to whom, and under what conditions should outcomes be reported?
Answers to these questions are based on the assumptions that educators have threefold
responsibility.

a) To report outcomes accurately.

b) To make sure that these outcomes are properly interpreted.

c) To respect every person's right of privacy.

To share the above mentioned threefold responsibility, it is necessary to let the teacher
know about the Prerequisites for Reporting.

13
Prerequisites for Reporting
Logic dictates the following prerequisites for proper reporting:

1) The first requirement of educational reporting is having information to


repot.
2) In addition to getting information to repot, teacher must also Sort or
categorize information. This sorting will make the reporting task much
easier and more useful
3) Academic achievements in each course or subject should be kept separate.
4) Attendance, social behaviour and study characteristics should not be
mixed.
5) The next prerequisite for effective reporting knows about the
characteristics of one's observations or recorded scores. For example,
parents should not be expected to understand the technical aspects of
reliability and validity. But a teacher’s knowledge of these concepts will
influence his or her reporting procedures. Therefore, it is better for a
teacher to know more about the quality of his or her observations, which
will enable him or her for reliable and valid reports.
6) Before reporting, a survey may be conducted to find out what parents and
students care about
7) In order to provide educational reporting, you must understand the
statistical characteristics of the information, and find the best way to
market it.

After knowing about the basic principles of good marking system, educational reporting,
its prerequisites the function of marks and grades should also be made clear to the teacher. So
that, an ideal progress report to the parents and the pupils.

Functions of Marks and Progress Reports


School marks and other reports of pupil’s progress serve a variety of functions in the
schools, which can best be described in relation to the reports users, including.

1. Reports to the Pupils and Parents

2. Use of repots by Teachers and, Counselors

3. Use of Reports by Administrators

14
Reports to the Pupils and the Parents

The main reason for reporting to pupils and parents is to facilitate the pupils learning and
development. Therefore, the reports should.

a) Clarity the school program's objectives.

b) Indicate the pupil's strengths and the weaknesses in learning

c) Promote greater understanding of the pupil's personal social development and,

d) Contribute to the pupil's motivation.

From the standpoint of pupil learning most of the functions are probably best served by
the day-to-day evaluation and feedback during instruction. However. There also a need for a
periodic summary of progress. Pupils find it difficult to integrate test scores. ratings and other
evaluation results into an overall appraisal of their success in attaining school objectives, and so
the periodic progress report supplies this summary appraisal in addition it giving pupils a general
picture of how they are doing Such reports also give them a basis for checking the adequacy of
their own self estimates of learning progress

Uses of Reports by Teachers and Counselors


Marks and progress reports contribute to the schools instructional and guidance
programmes by providing more information about pupils. Such reports supplements and
complement test scores and other evaluation data in the cumulative records. If pupils past
achievements are known. we can better understand their present strengths and weaknesses and
can better predict the areas in which they are likely to be successful. The increased information
supplied by progress report is especially useful to teachers when they are planning instructions,
diagnosing the earning difficulties and coping with special problems of personal-social
development. Counselors use the reports, along with other information to help pupils develop
better self-understanding and make more realistic educational and vocational plans. any progress
reports also contain information useful in counseling pupils with emotional problems.

The schools instructional and guidance functions seem to be best served by a reporting
system that is both comprehensive and diagnostic to guide learning effectively. Aid personal-
social development and help with future planning teachers and counselors need detailed
information concerning the pupils abilities and disabilities in each area of development.

15
Use of Reports by Administrator
Marks and progress reports serve a number of administrative functions. They are used for
determining promotion and graduation, awarding honors, determining athletic eligibility, and
reporting to other schools and prospective employers. For most administrative purpose, a single
letter grade is preferable, largely because such marks are compact and can easily recorded and
averaged. With the increased use of machines for routine clerical work, this advantage will
probably become even more important in the future.

There is little doubt that the convenience of the single mark in administrative work has
been a major factor in retarding the development of more comprehensive and useful progress
reports this need not be the case. However When a new reporting system is developed it is
possible to use letter grades for administrative purposes and Supplement them with the type of
information needed by pupils, parents teachers and counselors. At the high School level the
retention of letter grades is almost mandatory, because most college admission officers insist on
them.

Q#4

Discuss the latest trends in classroom testing in global and local context. How
instruction can play an important role to test the weaknesses among students.

Latest trends in Classroom Testing in globe

1. Movement away from traditional assessment delivery methods.

2. The end of the road for pen and paper.

3. Much more engaging and effective assessment.

4. Increasing levels of automation.

5. Assessments are much more candidate centric.

These trends have a wide-ranging impact on many different organizations, including


corporations, professional membership bodies, educational institutes, training companies and
government departments.

1. Movement away from traditional assessment delivery methods

The use of professional which recreates the exam hall experience in an online
environment means there is a move away from the use of traditional assessment delivery

16
methods, such as running exams in a test center. Remote invigilation (also known as online
proctoring) means that a secure exam can be run from any location as long as there is an internet
connection. This gives a great deal of flexibility to candidates, who can sit their exam at a time
and place that suits them, rather than spend time and incur costs associated with taking time off
and travelling to a test center.

Live remote invigilation happens in real-time. This means that for the duration of an
exam, an invigilator watches the candidate using video, audio and remote screen share. The
session is recorded and can be reviewed at a later stage if required. Any infringements can be
raised as they happen e.g. if the candidate keeps looking away from the screen, the candidate will
be advised to stop this behaviour. If infringements are severe e.g. the candidate takes a phone
call or someone else comes into the room, the exam may be immediately stopped.

For organizations, the benefits of remote invigilation are numerous, such as a


significantly reduced administration overhead, greater security and the ability to cater for
candidates in any country worldwide. Exams can also be offered with greater frequency, so
instead of one long test available once or twice a year, there may be multiple shorter tests run
closer to the period of tuition.

The use of remote invigilation really is a game changer. To give an example, at Test
Reach there was a candidate who was stuck in traffic and unable to get home in time to log in for
his exam. With the permission of the examining body, he pulled his car over to the side of the
road and sat his invigilated exam from his car, using the hotspot on his phone to connect to the
internet. Suffice to say, he passed his exam and gained his diploma! It’s a long way from a cold
exam hall and writing with pen and paper for 3 hours…

2. The end of the road for pen and paper

That brings us to another big change in the world of assessment, and that is the move
away from using pen and paper as an exam delivery method.

Using pen and paper gives rise to many issues around administration and security, some
of which are outlined below:

 There is a huge administration burden with printing, transporting, marking and


storing papers.
 There are security issues with the transportation of papers and managing who has
access to them.
 There is a lack of real-time visibility and reporting.

17
 It is challenging to manage the paper flow – printing papers, collecting finished
scripts, sending them to markers, storage etc.
 It is easier for errors to be made in data entry and reporting.
 There is a much longer lead time to results.
 People are much less used to writing nowadays, as everyone works on keyboards
and screens – most professional candidates find it very difficult to write for hours.
 Writing with a pen requires a completely different approach to the way most
people actually work. We are now used to jotting down initial thoughts and then
editing them until we are happy with the final result. This kind of editing is not
possible on paper, without lots of crossings out – you have to think about and plan
what you are going to write, before putting pen to paper, which is a very different
approach and can put candidates at a disadvantage.
 Very often working with pen and paper is not in line with how we actually carry
out our work in a day-to-day environment. No accountant prepares a set of
accounts on paper, it is all done via spreadsheets, so why should they be penalised
during an exam by having to use a pen and paper.
For organizations who are running exams on paper, it is possible to move to online
delivery in phases, so it doesn't have to be a "big bang", high risk approach. For example, you
may create and manage the question bank online, but print paper-based exam papers. The paper
scripts can then be scanned and marked automatically or sent to relevant examiners where
manual marking is required. On-screen marking has developed dramatically in recent years with
a wide range of online options available - you can read more about on-screen marking options
here.

3. Much more engaging and effective assessment

Another key trend has been the move towards the creation of a much more engaging and
effective assessment. Organizations no longer have to use only simple, one-dimensional multiple
choice and essay questions. With the move to online there is now a huge range of question types
available, which help to make assessments much more immersive.

Using a variety of question types gives greater insight into what people know and how
they apply that knowledge in practice. Multi-media options allow the use of videos, photographs,
audio playback, graphs, labeling, drag & drop and many others.

18
Having a flat pen and paper test allows very little flexibility, it’s typically a one size fits
all approach. Candidates can be offered some choices, such as answer 3 questions out of 5, or if
you are taking the advanced paper, move on to section 5, but these choices are typically
extremely limited. Moving to computer-based assessment enables testing to be much more
adaptive to meet this specific needs of the individual. At its simplest level this could be
branching logic, so if a candidate selects an option indicating that they have specialised in topic
A, then they are asked questions about topic A; candidates selecting topic B are asked about
topic B, etc. There are also more comprehensive levels of adaptive testing, where question sets
are tailored based on how candidates have answered previous questions, or based on which
questions they have answered incorrectly, etc. This can be of use in situations where perhaps a
candidate is performing very poorly in questions on a particular area. In this case the candidate
might be asked simpler questions on that area, or alternative questions about other areas to at
least give them a chance to show the level of knowledge they have.

4. Increasing levels of automation

Not surprisingly there is a significant trend to increase the levels of assessment


automation and there are a number of ways that this is being done.

Many organizations are moving to a “LOFT” Model, which stands for linear-on-the-fly
testing. With this model organizations can set up question banks where each question can be
associated with different tags – these tags might be learning outcomes, syllabus topics, difficulty
rating or job roles, and then automatic picking rules can be defined for each exam paper. This
means that randomized papers can be generated for each candidate. Each test uses different
questions but they are an equivalent in terms of what they are assessing. The big positive is that
papers are generated automatically, so once the exam bank has been set up, it is just a case of
maintaining it, as opposed to having to continually generate new exam papers.

The LOFT model also reduces the ability for candidates to attempt to collaborate on
questions or share exam content online, as everyone is getting a different set of questions.

Because many questions can be auto-scored, results can be issued very quickly with clear
feedback on learning outcomes and areas for improvement.

Another area where there are increasing levels of automation is systems integration.
Typically there are integration points so that candidate details can automatically flow from a
Learning Management System (LMS), a Customer Relationship Management (CRM) system or
a HR system directly into the assessment solution. Candidates then get automatically enrolled in
the correct exam. There may be various other integration points, such as to push results from the
assessment system into a system of record after the exam.

19
There are also a lot of advancements being made in the area of automated marking, where
it’s now possible to auto-score a wide range of question types that can include text answers.

5. Assessments are much more candidate centric

At Test Reach we have a wide range of customers from many different backgrounds. We
work with educational institutes, professional bodies, governments, corporations – and one thing
they all have in common is placing a huge emphasis on the quality of the candidate experience.

It is essential that information and examinations are presented in a user-friendly way.


Things like interactive canvasses are useful in this context, where case studies and questions can
be presented together on-screen, with the option to make notes, highlight and annotate. There are
also options for the candidate to configure the view in line with their own preferences. This type
of innovation means that all the features you’d expect from a paper-based exam are readily and
easily available in the online format.

Other ways in which online assessment makes things more candidate-friendly are the
speed in turnaround of results, and to make it easier to provide detailed feedback. Candidates do
not want to just know their grade but also to understand areas in which they were strong and
those areas where they may need to improve. This is becoming a very important area of focus for
many organizations, as this kind of feedback to candidates provides a lot of benefits particularly
for students who fail.

6. There is increasing public concern about testing


Decisions concerning the selection, administration, and use of educational tests are no
longer left to the educator alone; the public has become an active and vocal partner. At the state
level, mandated assessment programmes have been Imposed on the schools as a result of the
public demand for evidence of the school programmes effectiveness. In some states, the public-
at-large has participated, through selected groups in determining the objectives and standard of
the statewide assessment programmes. In other states in which competency testing has been
made the responsibility of the local school district, parent groups have often help to shape the
programmes. It is interesting to note that the concern of state legislators and the general public
with the quality of school programmes has created a demand for more testing in the schools not
less.

During the expansion of testing programmes there has also been some concern that there
is too much testing in the schools, especially for high school students. In addition to taking the
tests in the local school programmes, these students may also have to take one or more state
competency tests and several college admissions tests. It is feared that the heavy demand on their

20
time and energy might detract them from their schoolwork and that external testing programmes
may cause undesirable shifts in the school’s curriculum.

Successful student learning is most effective with an aligned system of standards,


curriculum, instruction, and assessment. When assessment is aligned with instruction, both
students and teachers benefit. Students are more likely to learn because instruction is focused and
because they are assessed on what they are taught. Teachers are also able to focus, making the
best use of their time

Teachers are responsible for providing instruction by identifying teaching practices that
are effective for all students, since not every student learns or retains information in the same
way. This is where teachers get to be creative in how they engage students in learning.

Assessments are the tools and methods educators use to what students know and are able
to do. Assessments range from teacher questioning techniques to statewide assessments like
PARCC. Assessments are only useful if they provide information that is used to improve student
learning. Assessment inspires us to ask these hard questions: "Are we teaching what we think we
are teaching?" "Are students learning what they are supposed to be learning?" "Is there a way to
teach the subject better, thereby promoting better learning?"

There is greater emphasis on the use of tests to improve learning and


Instruction
The shift away from using tests primarily to evaluate students and schools can be seen at
all levels of testing In classroom testing there is increased emphasis on formative testing, the use
of mastery and diagnostic tests, and the integration of testing and learning in individualized
instructional programmes With the increase use of computer scoring, tests results from local
school wide testing programs, the new emphasis on accountability and competency testing has
resulted in greater concern with improved learning and instruction. In fact some states has
mandated that remedial instruction must be given to pupils scoring low on a competency test,
The National Assessment of Educational Progress has helped shift the focus from test scores to
description of pupil learning by keying its assessment exercises to clearly stated objectives and
by presenting its report in terms of pupil performance on specific achievement tasks.

There are more uses of computers in testing


The use of computer has had a profound impact on the development of testing and can be
expected to have even greater influence in the future. From the test user's standpoint, the most
readily apparent contribution has been in the scoring and analysis of tests. Test publishers can
now provide test results in almost any form desired. A summary of results by class, building, and
district can be supplemented by item-analysis data for each item and by a printout for each pupil
showing correct and incorrect responses. A comparison with both national and local norms can
be made, and in many cases, interpretation by objective or content cluster is available. The type
21
of feedback made possible by the computer has been a major factor in the trend toward using test
results to improve learning and instruction.

Test User Competence


It is essential not only that tests be of high quality. but also that test users be qualified to
use them. Tests are sophisticated psychological tools that can be used in harmful and/ or
effective as well as helpful ways. They should be used only by qualified and informed
individuals who will make sure that they are used and interpreted effectively and correctly.
Accordingly, a major principle of the APA ethical code is that psychological knowledge and
techniques are used only by those qualified to use them and, conversely, that test users operate
only within the bounds of their own knowledge and competence. According to the APA code it
is important that psychologists who use test results, have an understanding of psychological
measurement, problems of test validation, and test research. Therefore, it is important to ensure
that tests and test scores are used only by those people who are qualified to use them.

Qualifications for test use vary according to the types of the tests in question, but in
general are Stricter with tests having greater potential and harm and misinterpretation. One of the
earliest systems by which user qualifications were specified, provided by the first APA test
standards, has classified tests according to three levels of complexity

Level A tests are those that can be administered, scored, and interpreted by responsible
non-psychologists who have carefully read the test manual and are familiar with the overall
purposes of testing. Educational achievement tests fall into this category.

Level B tests require technical knowledge of test construction and use and appropriate
advanced coursework in psychology and related courses (e.g. statistics, individual differences,
and counseling). Vocational interest inventories, group intelligence and special aptitude tests,
and some personality inventories are considered Level B tests. For example, consulting
generally Psychologists Press limits purchase of tests such as the Strong-Campbell Interest
Inventory, the State-Trait Anxiety Inventory, the Myers-Briggs Type Indicator, and the Bern Sex
Role Inventory to people who have Completed university Courses in tests and measurements or
equivalent training. Similar requirements for access to tests such as the Jackson Vocational
Interest the Personality Research Form and the Jackson Personality Inventory are stated by their
publisher, Research psychologists press.

Level C tests require an advanced degree in psychology or Licensure as a psychologist


and advanced training/supervised experience in the particular test. Level C tests generally
include individually administered intelligence tests and personality tests, (e.g... the Stanford-
Binet Intelligence Scale, the Wechsler Adult Intelligence Scale. and the Minnesota Multiphasic
Personality Inventory). Graduate students may be qualified to purchase and use Level B or Level
C tests if they are being supervised in that work by someone who does possess the appropriate
user qualifications. Specific criteria for test user qualifications are contained in the catalogues
22
distributed by the major test publishers. Responsible test publishers not only list user
qualifications, but also require potential test purchasers to provide their credentials so that only
qualified people are sold tests. But ensuring that test users are competent is also the
responsibility of the administrators of any organization or agency using tests e.g. schools and
businesses), of test developers who should make available complete information about the
technical adequacy and use of the test, and of test users themselves, who, even after earning the
appropriate degrees, must engage in continued study of testing research and techniques in order
to be competent in their use. A qualified test user not only possess the necessary education
training, and experience but is familiar with the technical psychometric characteristics of the test
to be used, is able to rather than defend the use of the particular test selected alternative tests, is
knowledgeable about both administration and interpretation of the test and is aware of the
potential misuses of the test and circumstances and or types of individuals with whom particular
care must be taken.

Q#05

Explain some situations in which arithmetic mean of scores is not


recommended to interpret performance or results.
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to use than
others. In the following sections, we will look at the mean, mode and median, and learn how to
calculate them and under what conditions they are most appropriate to be used.

Arithmetic mean
The arithmetic mean is the most commonly used average. is usually called mean or
average. The arithmetic mean is defined as the number obtained by dividing the sum of the
scores by their number. It is denoted by putting bar on the variable symbol e.g. X (read as X bar).

Its symbol is

Mean = ∑ X / N

23
How the Arithmetic Mean Works

The arithmetic mean maintains its place in finance, as well. For example, mean earnings
estimates typically are an arithmetic mean. Say you want to know the
average earnings expectation of the 16 analysts covering a particular stock. Simply add up all the
estimates and divide by 16 to get the arithmetic mean.

The same is true if you want to calculate a stock’s average closing price during a
particular month. Say there are 23 trading days in the month. Simply take all the prices, add them
up, and divide by 23 to get the arithmetic mean.

The arithmetic mean is simple, and most people with even a little bit of finance and math
skill can calculate it. It’s also a useful measure of central tendency, as it tends to provide useful
results, even with large groupings of numbers.

Limitations of the Arithmetic Mean

The arithmetic mean isn't always ideal, especially when a single outlier can skew the
mean by a large amount. Let’s say you want to estimate the allowance of a group of 10 kids.
Nine of them get an allowance between $10 and $12 a week. The tenth kid gets an allowance of
$60. That one outlier is going to result in an arithmetic mean of $16. This is not very
representative of the group.

In this particular case, the median allowance of 10 might be a better measure.

The arithmetic mean also isn’t great when calculating the performance of investment
portfolios, especially when it involves compounding, or the reinvestment of dividends and
earnings. It is also generally not used to calculate present and future cash flows, which analysts
use in making their estimates. Doing so is almost sure to lead to misleading numbers.

The arithmetic mean can be misleading when there are outliers or when looking at
historical returns. The geometric mean is most appropriate for series that exhibit serial
correlation. This is especially true for investment portfolios.

Example of the Arithmetic vs. Geometric Mean


Let's say that a stock's returns over the last five years are 20%, 6%, -10%, -1%, and 6%.
The arithmetic mean would simply add those up and divide by five, giving a 4.2% per year
average return.

The geometric mean would instead be calculated as (1.2 x 1.06 x 0.9 x 0.99 x 1.06)1/5 -1
= 3.74% per year average return. Note that the geometric mean, a more accurate calculation in
this case, will always be smaller than the arithmetic mean.

24
Moreover the mean is essentially a model of your data set. It is the value that is most
common. You will notice, however, that the mean is not often one of the actual values that you
have observed in your data set. However, one of its important properties is that it minimises error
in the prediction of any one value in your data set. That is, it is the value that produces the lowest
amount of error from all other values in the data set.

An important property of the mean is that it includes every value in your data set as
part of the calculation. In addition, the mean is the only measure of central tendency where the
sum of the deviations of each value from the mean is always zero.

When not to use the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being especially
small or large in numerical value. For example, consider the wages of staff at a factory below:

Staff 1 2 3 4 5 6 7 8 9 10

Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests
that this mean value might not be the best way to accurately reflect the typical salary of a worker,
as most workers have salaries in the $12k to 18k range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.

Another time when we usually prefer the median over the mean (or mode) is when our
data is skewed (i.e., the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal, the mean, median and mode are identical. Moreover, they all represent the most typical
value in the data set. However, as the data becomes skewed the mean loses its ability to provide
the best central location for the data because the skewed data is dragging it away from the typical
value. However, the median best retains this position and is not as strongly influenced by the
skewed values. This is explained in more detail in the skewed distribution section later in this
guide.

25
Conclusion

To calculate the central tendency for the given data set, we use different measures like
mean, median, mode and so on. Among all these measures, the arithmetic mean or mean is
considered to be the best measure, because it includes all the values of the data set. If any value
changes in the data set, this will affect the mean value, but it will not be in the case of median or
mode.

Can arithmetic mean be negative?

Yes, the arithmetic mean can be negative. The data can be distributed anywhere. So, the
mean value can be negative or positive or zero.

26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy