(6507) Educational Measurement and Evaluation New
(6507) Educational Measurement and Evaluation New
Measurement
Measuring Behaviour
At the Measuring Behavior conference, researchers present methods and techniques to
get insights into pig behavior in a stable, teacher behavior in a classroom, visitor behavior in a
museum, crab behavior in an open field, citizen behavior in a smart city, consumer behavior in a
store, and much more. We haven't had time to follow all five behavior research tracks but here
are a few highlights of the first conference day.
Some of the important indicators/techniques which are employed to measure verbal and
non-verbal behaviour are mentioned below:
2. Duration of Response:
Another factor which is taken into consideration for measuring behaviour is the duration
of time for which a particular behaviour of response occurs. Measurements of after images and
such other sensory experiences are subjected to this type of indices. Suppose you look at a bright
green light.
The experienced greenness may remain for a moment even after you cease looking at the
light. Similarly, when you hear a loud sound prolonged for some time or inhale a strong perfume
for a long time the sound and perfume will remain for some time even after these stimuli are
withdrawn.
1
experiment, one of the criteria employed to measure whether the rat or cat has learnt the correct
path is in terms of the time taken by the animal to reach its goal.
4. Frequency of Response:
The number of times a particular response occurs within a given time or on a particular
occasion is another indicator. An example of this type can be seen in the measurement of
fluctuation of attention. Experiments on fluctuation of attention employ, as an index, the number
of times attention shifts from one aspect of a given stimulus to another within a stipulated time
limit.
5. Amount of Response:
In measuring emotional behaviour the amount or intensity of glandular and muscular
responses is employed as an indicator. If a person’s aggression has to be measured, then the
experimenter may try to measure the subject’s blood pressure, rate of respiration, rate of
heartbeat, gestures, tone, facial and other expressions accompanied by certain psychological
changes. Only after analyzing and combining a variety of such data does one may arrive at the
measure indicating overall reaction of aggression or total amount of aggressive reaction.
For instance, in Binet’s intelligence test the items are arranged in such a manner that the
complexity and difficulty level is increased gradually as the test advances. Thus, we see that
psychologists employ different kinds of measures depending on the nature of the behaviour and
the purpose of the measurement.
2
Personality inventories
Personality inventories, also called objective tests, are standardized and can be
administered to a number of people at the same time. A psychologist need not be present when
the test is given, and the answers can usually be scored by a computer. Scores are obtained by
comparison with norms for each category on the test. A personality inventory may measure one
factor, such as anxiety level, or it may measure a number of different personality traits at the
same time, such as the Sixteen Personality Factor Questionnaire (16 PF).
Explanation
A personality inventory is a self-assessment tool that career counselors and other career
development professionals use to help people learn about their personality types. It reveals
information about individuals' social traits, motivations, strengths and weaknesses, and attitudes.
Experts believe these factors play an important role in job and career success and satisfaction.
Cornell Index
This was developed during world war-n and the civilian revision of this questionnaire
contains 101 yes-no items pertaining to feelings of fear and inadequacy, depression and other
pathological mood reactions, nervousness and anxiety, etc., and several kinds of psychosomatic
symptoms. The time required to complete the inventory varies from 5 to 15 minutes.
3
Emotional. An additional score has been provided in the adult form to measure occupational
adjustment. Encircling Yes' or 'No' records answers
Bernreuter Inventory.
It consists of 125 Hems and is designed to measure six scores 1) Neuroticism. (2) Self-
sufficiency, (3) Introversion. (4) Dominance, (5) Confidence, (6) Sociability. The last two were
added by Flanagan.
This manual provides norms on all six scores for High School College and general adult
population
The Inventory appears to be more effective with normal and near normal than with
psychotics
Guilford Inventory
The latest inventory in the Guilford series consists of 300 items, 30 for each of the
following 10 scores.
G. General Activity
R. Restraint
A. Ascendance
S. Sociability
E. Emotional Stability
4
0. Objectivity
F. Friendliness
T. Thoughtfulness
Cattell’s Inventory
Cattell has devised the Sixteen Personality Factor Questionnaire. This inventory is
available in two parallel forms, A and B each containing 187 items. The use of both forms is
advocated for greater reliability.
5
Evaluation of Personality Inventories
The construction and the use of personality inventories are beset with special difficulties in view
of the following:
Q#02
How the length and item difficulty of a test can influence the appropriate
assessment of students? Explain other considerations that help to develop
appropriate test items
Test length
The length of a test is also an important factor in obtaining a representative sample. Test
length is determined when the set of specifications is built and depend on such factors as the
purpose of testing, the types of test items used, the age of the pupils, and the level of reliability
needed for effective test use. Thus, a criterion-referenced mastery test over third-grade social
studies until night contain 30 objective items, whereas a norm-referenced survey test over a
tenth-grade social studies course might contain more than 100 objectives items and several essay
questions. Although there are no hard and fast rules for determining test length, an important
consideration from a sampling standpoint is the number of test items devoted to each specific
area being measured. We want our classroom test to be long enough t provide an adequate
sampling of each objective and each content area as a rule of thumb. it is desirable when
constructing a criterion-referenced mastery test to use at least ten objective test items to measure
each specific learning outcome. This number however, might be lowered to as few as five if the
task is extremely limited e.g. "Adds two single-digit numbers. Capitalizes proper names and the
pupils are to supply the answers rather than to selection them. For a norm-referenced test, where
the sample of test items typically covers a board area and emphasis is on the total score, using
several objective test items for each specific learning outcome and ten more for each general
objective would probably be sufficient.
6
Special problem of sampling arise when complex learning outcomes are being measured,
because here we must turn to more elaborate objective-type Items and essay question. Both items
types require considerable testing time, but a single test exercise is still inadequate for measuring
intended outcome. One exercise calling for the interpretation graphs, the nature of the data or the
type of graph may be the most influential factor in determining whether it is interpreted properly.
When several graphs are used the effect of such factors is minimized, and we obtain a more
representative sample of the ability to inter prêt graphs. A similar situation occurs with the use of
essay questions. The answer to any single question depends too heavily on the particular sample
of information called for by the question and thus the only feasible is to confine each test of
complex outcomes to a rather limited area e.g. graph interpretation, problem solving )and to test
more often. In any event our aim should be to obtain as representative a sample of pupil
performance as possible in each area to be tested other things being equal, the greater the number
of test items the greater the likelihood of an adequate sample and thus the more reliable the
results.
For criterion referenced test at the developmental level of learning. we need test item of
varying difficulty for each instructional objectives. Ideally, the difficulty of the test tasks would
be derived directly from the instructional content, rather than form some arbitrary attempt to
manipulate item difficulty. It should also be kept in mind that in criterion referenced mastery
testing, a wide range of scores is expected.
Item Difficulty and Norm Referenced Testing. Because norm referenced tests are
designed to rank students in order of achievement. deliberate attempts are made to obtain a wide
spread of scores. That is why easy items are eliminated to maximize the differences in students’
performance. Maximum differentiation among students in terms of achievement is obtained
when the average score is near the midpoint of the possible scores, and the scores range from
7
near zero to near perfect. The average difficulty to try for on a 100-item test for various choice
type items would be as follows:
One way of ensuring that tests have a desirable influence on pupil learning is
to pay particular attend on to the breadth of content and learning outcomes
measured by the tests
When we select a representative sample of content from all of the areas covered in our
instruction, we are emphasizing to our pupils that they must devote attention to all areas: they
cannot neglect some aspects of the course and do well on the tests. Similarly, when our tests
measure a variety of types of learning outcomes, the pupils Soon learn that a mass of memorized
facts, develop conceptual understandings, draw conclusions, recognize assumptions, identify
cause-and-effect relations, and the like. This discourages them from depending solely on
memorization as a basis for learning and encourages them to use more complex mental
processes.
8
Finally, a test will contribute to improved teacher-pupil relations (with a
beneficial effect on pupil learning if pupils view the test as fair and useful
measure of their achievement
We can make fairness apparent by including a representative sample of the learning tasks
that have been emphasized during instruction, by writing concise directions, by making certain
that the intent of each test item is clear and free of any bias that would prevent a knowledgeable
person from answering correctly, and by providing adequate time limits for the test. The pupils'
recognition of usefulness however depends as much on what we do with the results of the test as
on the characteristics of the test itself. We can make the usefulness apparent by using the result
as a basis for guiding and improving learning.
The Purpose of Classroom Testing Classroom tests can be used for a variety of instructional
purposes. However, the various uses of tests and other evaluation instruments can be classified
into four types of classroom evaluation:
1. Placement evaluation,
2. Formative evaluation,
4. Summative evaluation.
Because teacher-made tests are useful in all four areas, this classification system provides a
convenient basis for considering the role of test purpose in planning the classroom test
Placement Testing
Most placement tests constructed by classroom teachers are pretests designed to measure
9
2 To what extent pupils have already achieved the objectives of the planned
instruction.
In the first instance we are concerned with the pupils readiness to begin the instruction. In
the second we are concerned with the appropriateness of our planned instruction for the group
and with proper placement of each pupil in the instructional sequence. Pretests for determining
prerequisite skills are typically rather limited in scope. For example a pretest in Algebra might be
confined to computational skill in arithmetic; a pretest in science might consist solely of science
terms; and a pretest in beginning
Formative Testing
Formative tests are given periodically during instruction to monitor pupils learning
progress and to provide ongoing feedback to pupils and teacher. Formative testing reinforces
successful learning and reveals learning weaknesses in need of correction. A formative test
typically covers some predefined segment passes a rather limited sample of learning tasks the
test items may be easy or difficult, depending on the learning tasks in the segment of instruction
being tested. Formative tests are typically criterion- referenced mastery tests, but norm-
referenced survey tests can also serve this function. Ideally the test will be constructed in such a
way that corrective prescriptions can be given for missed test items or sets of test items. Because
the main purpose of the test is to improve learning, the results are seldom used for assigning
grades.
Diagnostic Testing
Diagnosis of persistent learning difficulties involves much more than diagnostic testing.
But such tests are useful in the total process. The diagnostic test takes up where the formative
test leaves off if pupils do not respond to the feedback-corrective prescriptions of formative
testing, a more detailed search for the source of learning errors will be indicated. For this type of
testing, we will need to include a number of test items in each specific area, with some slight
variation from item to item. In diagnosing pupils' difficulties in adding whole numbers, for
example we would want to include addition problems containing various number combinations,
with some not requiring carrying and some requiring carrying, to pinpoint the specific types of
error each pupil is making. Because our focus is on the pupils' learning difficulties, diagnostic
10
tests must be constructed in accordance with the most common sources of error that pupils
encounter. Such tests are typically confined to a limited area of instruction, and the test items
tend to have a relatively low level of difficulty.
Summative Testing
The summative test is given at the end of a course or unit of instruction, and the results
are used primarily for assigning grades or certifying pupil mastery of the instructional objectives.
The results can also be used, for evaluating the effectiveness of the instruction. The end-of-
course test (final examination) is typically a norm-referenced survey test that is broad in
coverage and includes test items with a wide range of difficulty. The more restricted end- of-unit
summative test might be norm referenced or criterion referenced, depending on whether mastery
or developmental outcomes are the focus of instruction.
11
Outlining the Course Content
The list of instructional objectives describes the types of performance the pupils are
expected o demonstrate (e-g. knows understands, applies) and the course content indicates the
area in which each type of performance is to be shown. Thus the second step in preparing the test
specifications is to outline the course content. This maybe simply a list of major topics to be
covered during the course or a more detailed list of topics and subtopics The amount of detail in
the content outline depends on the purpose of the test the segment of the course covered, and the
type of test interpretation to be used. A criterion-referenced test (used to describe the learning
tasks pupils can perform). for example, will require a much more detailed description of both
objectives and content than will norm-referenced test used to rank pupils in order of
achievement. 111
Q#03
The greatest confusion arises when pupil progress is summarized as a sirngle letter grade
(e.g. A.B.C.D.E.F. It can be explained ass under:
12
2. Should effort be included? Or should high achievers be given good narks
regardless of their effort?
There are no simple answers to such questions. Practice varies from school to school, and
frequently from teacher to teacher with the same school system. Therefore. it is better to point
out basic principles of good marking system.
According to Chand 1990 basic principles of a good marking System are as under
1. A marking system should be clear and definite, which can easily comprehended
by the pupils, teachers and parents.
6. A marking system must be used a means to an end and not as an end in itself.
When a teacher is aware of the basic principles of a good marking System then the next
step should be the awareness of the teacher about Educational Reporting
Educational Reporting
Educational reporting has been generally defined as the communication of educational
outcomes. When, how, why, to whom, and under what conditions should outcomes be reported?
Answers to these questions are based on the assumptions that educators have threefold
responsibility.
To share the above mentioned threefold responsibility, it is necessary to let the teacher
know about the Prerequisites for Reporting.
13
Prerequisites for Reporting
Logic dictates the following prerequisites for proper reporting:
After knowing about the basic principles of good marking system, educational reporting,
its prerequisites the function of marks and grades should also be made clear to the teacher. So
that, an ideal progress report to the parents and the pupils.
14
Reports to the Pupils and the Parents
The main reason for reporting to pupils and parents is to facilitate the pupils learning and
development. Therefore, the reports should.
From the standpoint of pupil learning most of the functions are probably best served by
the day-to-day evaluation and feedback during instruction. However. There also a need for a
periodic summary of progress. Pupils find it difficult to integrate test scores. ratings and other
evaluation results into an overall appraisal of their success in attaining school objectives, and so
the periodic progress report supplies this summary appraisal in addition it giving pupils a general
picture of how they are doing Such reports also give them a basis for checking the adequacy of
their own self estimates of learning progress
The schools instructional and guidance functions seem to be best served by a reporting
system that is both comprehensive and diagnostic to guide learning effectively. Aid personal-
social development and help with future planning teachers and counselors need detailed
information concerning the pupils abilities and disabilities in each area of development.
15
Use of Reports by Administrator
Marks and progress reports serve a number of administrative functions. They are used for
determining promotion and graduation, awarding honors, determining athletic eligibility, and
reporting to other schools and prospective employers. For most administrative purpose, a single
letter grade is preferable, largely because such marks are compact and can easily recorded and
averaged. With the increased use of machines for routine clerical work, this advantage will
probably become even more important in the future.
There is little doubt that the convenience of the single mark in administrative work has
been a major factor in retarding the development of more comprehensive and useful progress
reports this need not be the case. However When a new reporting system is developed it is
possible to use letter grades for administrative purposes and Supplement them with the type of
information needed by pupils, parents teachers and counselors. At the high School level the
retention of letter grades is almost mandatory, because most college admission officers insist on
them.
Q#4
Discuss the latest trends in classroom testing in global and local context. How
instruction can play an important role to test the weaknesses among students.
The use of professional which recreates the exam hall experience in an online
environment means there is a move away from the use of traditional assessment delivery
16
methods, such as running exams in a test center. Remote invigilation (also known as online
proctoring) means that a secure exam can be run from any location as long as there is an internet
connection. This gives a great deal of flexibility to candidates, who can sit their exam at a time
and place that suits them, rather than spend time and incur costs associated with taking time off
and travelling to a test center.
Live remote invigilation happens in real-time. This means that for the duration of an
exam, an invigilator watches the candidate using video, audio and remote screen share. The
session is recorded and can be reviewed at a later stage if required. Any infringements can be
raised as they happen e.g. if the candidate keeps looking away from the screen, the candidate will
be advised to stop this behaviour. If infringements are severe e.g. the candidate takes a phone
call or someone else comes into the room, the exam may be immediately stopped.
The use of remote invigilation really is a game changer. To give an example, at Test
Reach there was a candidate who was stuck in traffic and unable to get home in time to log in for
his exam. With the permission of the examining body, he pulled his car over to the side of the
road and sat his invigilated exam from his car, using the hotspot on his phone to connect to the
internet. Suffice to say, he passed his exam and gained his diploma! It’s a long way from a cold
exam hall and writing with pen and paper for 3 hours…
That brings us to another big change in the world of assessment, and that is the move
away from using pen and paper as an exam delivery method.
Using pen and paper gives rise to many issues around administration and security, some
of which are outlined below:
17
It is challenging to manage the paper flow – printing papers, collecting finished
scripts, sending them to markers, storage etc.
It is easier for errors to be made in data entry and reporting.
There is a much longer lead time to results.
People are much less used to writing nowadays, as everyone works on keyboards
and screens – most professional candidates find it very difficult to write for hours.
Writing with a pen requires a completely different approach to the way most
people actually work. We are now used to jotting down initial thoughts and then
editing them until we are happy with the final result. This kind of editing is not
possible on paper, without lots of crossings out – you have to think about and plan
what you are going to write, before putting pen to paper, which is a very different
approach and can put candidates at a disadvantage.
Very often working with pen and paper is not in line with how we actually carry
out our work in a day-to-day environment. No accountant prepares a set of
accounts on paper, it is all done via spreadsheets, so why should they be penalised
during an exam by having to use a pen and paper.
For organizations who are running exams on paper, it is possible to move to online
delivery in phases, so it doesn't have to be a "big bang", high risk approach. For example, you
may create and manage the question bank online, but print paper-based exam papers. The paper
scripts can then be scanned and marked automatically or sent to relevant examiners where
manual marking is required. On-screen marking has developed dramatically in recent years with
a wide range of online options available - you can read more about on-screen marking options
here.
Another key trend has been the move towards the creation of a much more engaging and
effective assessment. Organizations no longer have to use only simple, one-dimensional multiple
choice and essay questions. With the move to online there is now a huge range of question types
available, which help to make assessments much more immersive.
Using a variety of question types gives greater insight into what people know and how
they apply that knowledge in practice. Multi-media options allow the use of videos, photographs,
audio playback, graphs, labeling, drag & drop and many others.
18
Having a flat pen and paper test allows very little flexibility, it’s typically a one size fits
all approach. Candidates can be offered some choices, such as answer 3 questions out of 5, or if
you are taking the advanced paper, move on to section 5, but these choices are typically
extremely limited. Moving to computer-based assessment enables testing to be much more
adaptive to meet this specific needs of the individual. At its simplest level this could be
branching logic, so if a candidate selects an option indicating that they have specialised in topic
A, then they are asked questions about topic A; candidates selecting topic B are asked about
topic B, etc. There are also more comprehensive levels of adaptive testing, where question sets
are tailored based on how candidates have answered previous questions, or based on which
questions they have answered incorrectly, etc. This can be of use in situations where perhaps a
candidate is performing very poorly in questions on a particular area. In this case the candidate
might be asked simpler questions on that area, or alternative questions about other areas to at
least give them a chance to show the level of knowledge they have.
Many organizations are moving to a “LOFT” Model, which stands for linear-on-the-fly
testing. With this model organizations can set up question banks where each question can be
associated with different tags – these tags might be learning outcomes, syllabus topics, difficulty
rating or job roles, and then automatic picking rules can be defined for each exam paper. This
means that randomized papers can be generated for each candidate. Each test uses different
questions but they are an equivalent in terms of what they are assessing. The big positive is that
papers are generated automatically, so once the exam bank has been set up, it is just a case of
maintaining it, as opposed to having to continually generate new exam papers.
The LOFT model also reduces the ability for candidates to attempt to collaborate on
questions or share exam content online, as everyone is getting a different set of questions.
Because many questions can be auto-scored, results can be issued very quickly with clear
feedback on learning outcomes and areas for improvement.
Another area where there are increasing levels of automation is systems integration.
Typically there are integration points so that candidate details can automatically flow from a
Learning Management System (LMS), a Customer Relationship Management (CRM) system or
a HR system directly into the assessment solution. Candidates then get automatically enrolled in
the correct exam. There may be various other integration points, such as to push results from the
assessment system into a system of record after the exam.
19
There are also a lot of advancements being made in the area of automated marking, where
it’s now possible to auto-score a wide range of question types that can include text answers.
At Test Reach we have a wide range of customers from many different backgrounds. We
work with educational institutes, professional bodies, governments, corporations – and one thing
they all have in common is placing a huge emphasis on the quality of the candidate experience.
Other ways in which online assessment makes things more candidate-friendly are the
speed in turnaround of results, and to make it easier to provide detailed feedback. Candidates do
not want to just know their grade but also to understand areas in which they were strong and
those areas where they may need to improve. This is becoming a very important area of focus for
many organizations, as this kind of feedback to candidates provides a lot of benefits particularly
for students who fail.
During the expansion of testing programmes there has also been some concern that there
is too much testing in the schools, especially for high school students. In addition to taking the
tests in the local school programmes, these students may also have to take one or more state
competency tests and several college admissions tests. It is feared that the heavy demand on their
20
time and energy might detract them from their schoolwork and that external testing programmes
may cause undesirable shifts in the school’s curriculum.
Teachers are responsible for providing instruction by identifying teaching practices that
are effective for all students, since not every student learns or retains information in the same
way. This is where teachers get to be creative in how they engage students in learning.
Assessments are the tools and methods educators use to what students know and are able
to do. Assessments range from teacher questioning techniques to statewide assessments like
PARCC. Assessments are only useful if they provide information that is used to improve student
learning. Assessment inspires us to ask these hard questions: "Are we teaching what we think we
are teaching?" "Are students learning what they are supposed to be learning?" "Is there a way to
teach the subject better, thereby promoting better learning?"
Qualifications for test use vary according to the types of the tests in question, but in
general are Stricter with tests having greater potential and harm and misinterpretation. One of the
earliest systems by which user qualifications were specified, provided by the first APA test
standards, has classified tests according to three levels of complexity
Level A tests are those that can be administered, scored, and interpreted by responsible
non-psychologists who have carefully read the test manual and are familiar with the overall
purposes of testing. Educational achievement tests fall into this category.
Level B tests require technical knowledge of test construction and use and appropriate
advanced coursework in psychology and related courses (e.g. statistics, individual differences,
and counseling). Vocational interest inventories, group intelligence and special aptitude tests,
and some personality inventories are considered Level B tests. For example, consulting
generally Psychologists Press limits purchase of tests such as the Strong-Campbell Interest
Inventory, the State-Trait Anxiety Inventory, the Myers-Briggs Type Indicator, and the Bern Sex
Role Inventory to people who have Completed university Courses in tests and measurements or
equivalent training. Similar requirements for access to tests such as the Jackson Vocational
Interest the Personality Research Form and the Jackson Personality Inventory are stated by their
publisher, Research psychologists press.
Q#05
The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to use than
others. In the following sections, we will look at the mean, mode and median, and learn how to
calculate them and under what conditions they are most appropriate to be used.
Arithmetic mean
The arithmetic mean is the most commonly used average. is usually called mean or
average. The arithmetic mean is defined as the number obtained by dividing the sum of the
scores by their number. It is denoted by putting bar on the variable symbol e.g. X (read as X bar).
Its symbol is
Mean = ∑ X / N
23
How the Arithmetic Mean Works
The arithmetic mean maintains its place in finance, as well. For example, mean earnings
estimates typically are an arithmetic mean. Say you want to know the
average earnings expectation of the 16 analysts covering a particular stock. Simply add up all the
estimates and divide by 16 to get the arithmetic mean.
The same is true if you want to calculate a stock’s average closing price during a
particular month. Say there are 23 trading days in the month. Simply take all the prices, add them
up, and divide by 23 to get the arithmetic mean.
The arithmetic mean is simple, and most people with even a little bit of finance and math
skill can calculate it. It’s also a useful measure of central tendency, as it tends to provide useful
results, even with large groupings of numbers.
The arithmetic mean isn't always ideal, especially when a single outlier can skew the
mean by a large amount. Let’s say you want to estimate the allowance of a group of 10 kids.
Nine of them get an allowance between $10 and $12 a week. The tenth kid gets an allowance of
$60. That one outlier is going to result in an arithmetic mean of $16. This is not very
representative of the group.
The arithmetic mean also isn’t great when calculating the performance of investment
portfolios, especially when it involves compounding, or the reinvestment of dividends and
earnings. It is also generally not used to calculate present and future cash flows, which analysts
use in making their estimates. Doing so is almost sure to lead to misleading numbers.
The arithmetic mean can be misleading when there are outliers or when looking at
historical returns. The geometric mean is most appropriate for series that exhibit serial
correlation. This is especially true for investment portfolios.
The geometric mean would instead be calculated as (1.2 x 1.06 x 0.9 x 0.99 x 1.06)1/5 -1
= 3.74% per year average return. Note that the geometric mean, a more accurate calculation in
this case, will always be smaller than the arithmetic mean.
24
Moreover the mean is essentially a model of your data set. It is the value that is most
common. You will notice, however, that the mean is not often one of the actual values that you
have observed in your data set. However, one of its important properties is that it minimises error
in the prediction of any one value in your data set. That is, it is the value that produces the lowest
amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as
part of the calculation. In addition, the mean is the only measure of central tendency where the
sum of the deviations of each value from the mean is always zero.
The mean has one main disadvantage: it is particularly susceptible to the influence of
outliers. These are values that are unusual compared to the rest of the data set by being especially
small or large in numerical value. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests
that this mean value might not be the best way to accurately reflect the typical salary of a worker,
as most workers have salaries in the $12k to 18k range. The mean is being skewed by the two
large salaries. Therefore, in this situation, we would like to have a better measure of central
tendency. As we will find out later, taking the median would be a better measure of central
tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when our
data is skewed (i.e., the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal, the mean, median and mode are identical. Moreover, they all represent the most typical
value in the data set. However, as the data becomes skewed the mean loses its ability to provide
the best central location for the data because the skewed data is dragging it away from the typical
value. However, the median best retains this position and is not as strongly influenced by the
skewed values. This is explained in more detail in the skewed distribution section later in this
guide.
25
Conclusion
To calculate the central tendency for the given data set, we use different measures like
mean, median, mode and so on. Among all these measures, the arithmetic mean or mean is
considered to be the best measure, because it includes all the values of the data set. If any value
changes in the data set, this will affect the mean value, but it will not be in the case of median or
mode.
Yes, the arithmetic mean can be negative. The data can be distributed anywhere. So, the
mean value can be negative or positive or zero.
26