Mod 4 Measurability, Data Collection, Sampling
Mod 4 Measurability, Data Collection, Sampling
Dichotomous scale; The dichotomous scale is used to elicit a yes or no answer, as in the example below.
Note that a nominal scale is used to elicit the response. Example; do you own a car? Yes /No
Category scale; The category scale uses multiple items to elicit a single response as per the example.
This also uses the nominal scale. Example; where in northern California do you reside? North Bay, south
Bay, East Bay, Peninsula, Other
Likert scale; The Likert scale is designed to examine how strongly subjects agree or disagree with
statement on a 5-point scale with the following anchors:
Strongly disagree - 1
Disagree - 2
Neither Agree or nor Disagree - 3
Agree Strongly Agree – 4
This is also an interval scale and differences in the response between any two points on the scale remain
the same
2
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
Semantic Differential Scale; Several bipolar attributes are identified at extremes of the scale and
respondents are asked to indicate their attitude, on what may be called a semantic scale. This is treated as
interval scale.
Example;
Responsive ---Unresponsive
Beautiful ---Ugly
Numerical scale; The numerical is similar to the semantic scale, with the difference that numbers on a 5-
point or 7-points scale are provided. This is also an interval scale.
Example; How please are you with your new real estate agent?
Extremely pleased 7 6 5 4 3 2 1 extremely displeased
3
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
Itemized rating scale; A 5-point or 7-point scale with anchors, as needed, is provided for each item and
the respondent states the appropriates number on the side of each item, as per the examples that follow.
The response to the items is then summated. This uses an interval scale.
Fixed or constant sum scale; the respondent are here asked to distribute a given no. of points across
various items as per the example below. This is more in the nature of Ordinal scale.
Example; in choosing a toilet soap, indicate the importance you attach to each of the aspects by allotting
points for each to total 100 in all.
4
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
Stapel scale; this scale simultaneously measures both direction and intensity of the attitude toward the
items under study. The characteristic of interest to the study is placed at the center and a numerical scale,
say, from +3 to -3, on either side of the item. Since this does not have an absolute zero point, this an
interval scales.
Graphic Rating Scale; a graphical representation helps the respondent to indicate on this scale their
answers to a particular question by placing a mark at the appropriate point on the line as in the following
example. This is an ordinal scale, though the following example might appear to make it look like an
interval scale
5
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
6
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
RANKING SCALE: Ranking scale is used to tap preferences between two or among more objects or
items (ordinal in nature). However, such ranking may not give definitive clues to some of the answers
sought.
Example: There are 4 product lines; the manager seeks information that would help decide which
product line should get the most attention. Assume: 35% of respondents choose the 1st product. 25%
of respondents choose the 2nd product. 20% of respondents choose the 3rd product. 20% of
respondents choose the 4th product. 100%
The manager cannot conclude that the first product is the most preferred. Why? Because 65% of
respondents did not choose that product.
7
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
We have to use alternative methods like Forced Choice, Paired Comparisons, and the Comparative
Scale.
Forced Choice The forced choice enables respondents to rank objects relative to one another, among
the alternative provided. This is easier for the respondents, particularly if the number of choice to be
ranked is limited in number.
Example 11 Rank the following newspapers that you would like to subscribe to in the order of
preference, assigning 1 for the most preferred choice and 5 for the least preferred.
8
Measurability, Data collection, Sampling
Scaling, Reliability and Validity
Paired-Comparison Scale
Using the paired-comparison scale, the participant can express attitudes unambiguously by choosing
between two objects. The number of judgments required in a paired comparison is [(n)(n-1)/2], where n is
the number of stimuli or objects to be judged. Paired comparisons run the risk that participants will tire to
the point that they give ill-considered answers or refuse to continue. Paired comparisons provide ordinal
data.
Comparative scale
The comparative scale provides a benchmark or a point of reference to assess attitudes toward the
current object, event, or situation under study. An example of the use of comparative scale follows When
using a comparative scale, the participant compares an object against a standard. The comparative scale
is ideal for such comparisons if the participants are familiar with the standard. Some researchers treat the
data produced by comparative scales as interval data since the scoring reflects an interval between the
standard and what is being compared, but the text recommends treating the data as ordinal unless the
linearity of the variables in question can be supported.
9
Measurability, Data collection, Sampling
Goodness of Measures
The reliability coefficient obtained with a repetition of the same measure on a second occasion is
called test–retest reliability.
That is, when a questionnaire containing some items that are supposed to measure a concept is
administered to a set of respondents now, and again to the same respondents, say several
weeks to 6 months later, then the correlation between the scores obtained at the two different
times from one and the same set of respondents is called the test–retest coefficient.
The higher it is, the better the test–retest reliability, and consequently, the stability of the measure
across time.
10
Measurability, Data collection, Sampling
Goodness of Measures
When responses on two comparable sets of measures tapping the same construct are highly
correlated, we have parallel-form reliability.
Both forms have similar items and the same response format, the only changes being the wordings
and the order or sequence of the questions. What we try to establish here is the error variability
resulting from wording and ordering of the questions.
If two such comparable forms are highly correlated (say 8 and above), we may be fairly certain that
the measures are reasonably reliable, with minimal error variance caused by wording, ordering,
or other factors
11
Measurability, Data collection, Sampling
Goodness of Measures
This is a test of the consistency of respondents ‘answers to all the items in a measure.
To the degree that items are independent measures of the same concept, they will be correlated with
one another.
The most popular test of interitem consistency reliability is the Cronbach‘s coefficient alpha
(Cronbach‘s alpha; Cronbach, 1946), which is used for multipoint-scaled items, and the Kuder–
Richardson formulas (Kuder & Richardson, 1937), used for dichotomous items.
12
Measurability, Data collection, Sampling
Goodness of Measures
Split-half reliability reflects the correlations between two halves of an instrument. The estimates would vary
depending on how the items in the measure are split into two halves.
Split-half reliabilities could be higher than Cronbach‘s alpha only in the circumstance of there being
more than one underlying response dimension tapped by the measure and when certain other
conditions are met as well (for complete details, refer to Campbell, 1976).
Hence, in almost all cases, Cronbach‘s alpha can be considered a perfectly adequate index of the
interitem consistency reliability.
It should be noted that the consistency of the judgment of several raters on how they view a
phenomenon or interpret some responses is termed interrater reliability, and should not be confused
with the reliability of a measuring instrument.
13
Measurability, Data collection, Sampling
Goodness of Measures
VALIDITY
We examined earlier, the terms internal validity and external validity in the context of experimental designs.
That is, we were concerned about the issue of the authenticity of the cause-and-effect relationships
(internal validity), and their generalizability to the external environment (external validity).
We are now going to examine the validity of the measuring instrument itself.
That is, when we ask a set of questions (i.e., develop a measuring instrument) with the hope that we are
tapping the concept, how can we be reasonably certain that we are indeed measuring the concept we set
out to do and not something else?
This can be determined by applying certain validity tests. Several types of validity tests are used to test the
goodness of measures and writers use different terms to denote them. For the sake of clarity, we may group
validity tests under three broad headings: content validity, criterion-related validity, and construct validity.
14
Measurability, Data collection, Sampling
Goodness of Measures
Content Validity
Content validity ensures that the measure includes an adequate and representative set of items that tap the
concept. The more the scale items represent the domain or universe of the concept being measured, the
greater the content validity.
To put it differently, content validity is a function of how well the dimensions and elements of a concept have
been delineated.
A panel of judges can attest to the content validity of the instrument. Kidder and Judd (1986) cite the
example where a test designed to measure degrees of speech impairment can be considered as having
validity if it is so evaluated by a group of expert judges (i.e., professional speech therapists).
Face validity is considered by some as a basic and a very minimum index of content validity. Face validity
indicates that the items that are intended to measure a concept do on the face of it look like they measure the
concept. Some researchers do not see it fit to treat face validity as a valid component of content validity.
15
Measurability, Data collection, Sampling
Goodness of Measures
Construct Validity
Construct validity testifies to how well the results obtained from the use of the measure fit the
theories around which the test is designed. This is assessed through convergent and discriminant
validity, which are explained below. Convergent validity is established when the scores obtained with two
different instruments measuring the same concept are highly correlated. Discriminant validity is
established when, based on theory, two variables are predicted to be uncorrelated, and the scores
obtained by measuring them are indeed empirically found to be so.
16
Measurability, Data collection, Sampling
OTHER METHODS OF DATA COLLECTION
Observational Surveys
Whereas interviews and questionnaires elicit responses from the subjects, it is possible to gather data
without asking questions of respondents. People can be observed in their natural work environment
or in the lab setting, and their activities and behaviors or other items of interest can be noted and
recorded.
Apart from the activities performed by the individuals under study, their movements, work habits,
the statements made and meetings conducted by them, their facial expressions of joy, anger, and
other emotions, and body language can be observed. Other environmental factors such as layout,
work-flow patterns, the closeness of the seating arrangement, and the like, can also be noted.
The researcher can play one of two roles while gathering field observational data—that of a
nonparticipant-observer or participant-observer.
17
Measurability, Data collection, Sampling
OTHER METHODS OF DATA COLLECTION
Nonparticipant – Observer
The researcher may collect the needed data in that capacity without becoming an integral part of the
organizational system.
For example, the researcher might sit in the corner of an office and watch and record how the manager
spends her time.
Observation of all the activities of managers, over a period of several days, will allow the researcher
to make some generalizations on how managers typically spend their time. By merely observing
the activities, recording them systematically, and tabulating them, the researcher is able to come
up with some findings.
This, however, renders it necessary that observers are physically pre- sent at the workplace for
extended periods of time and makes observational studies time consuming.
18
Measurability, Data collection, Sampling
OTHER METHODS OF DATA COLLECTION
Participant – Observer
Here, the researcher enters the organization or the research setting, and becomes a part of the work
team.
For instance, if a researcher wants to study group dynamics in work organizations, then she may
join the organization as an employee and observe the dynamics in groups while being a part of the
work organization and work groups.
Much anthropological research is conducted in this manner, where researchers become a part of the
alien culture, which they are interested in studying in depth.
19
Measurability, Data collection, Sampling
OTHER METHODS OF DATA COLLECTION
Projective Methods
Certain ideas and thoughts that cannot be easily verbalized or that remain at the unconscious levels
in the respondents‘ minds can usually be brought to the surface through motivational research.
This is typically done by trained professionals who apply different probing techniques in order to
bring to the surface deep-rooted ideas and thoughts in the respondents.
Familiar techniques for gathering such data are word associations, sentence completion, thematic
apperception tests (TAT), inkblot tests, and the like.
20
Measurability, Data collection, Sampling
ETHICS IN DATA COLLECTION
As previously noted, these pertain to those who sponsor the research, those who collect the data,
and those who offer them.
The sponsors should ask for the study to be done to better the purpose of the organization, and not for
any other self-serving reason.
They should respect the confidentiality of the data obtained by the researcher, and not ask for the
individual or group responses to be disclosed to them, or ask to see the questionnaires.
They should have an open mind in accepting the results and recommendations in the report presented
by the researchers
21
Measurability, Data collection, Sampling
ETHICS IN DATA COLLECTION
NORMALITY OF DISTRIBUTIONS
For instance, when attributes such as height and weight are considered, most people will be
clustered around the mean, leaving only a small number at the extremes who are either very tall or
very short, very heavy or very light, and so on, as indicated in Figure 11.2.
22
Measurability, Data collection, Sampling
ETHICS IN DATA COLLECTION
If our sampling design – and sample size are right, however, the sample mean X will be within close
range of the true population mean µ.
Thus, through appropriate sampling designs, we can ensure that the sample subjects are not
chosen from the extremes, but are truly representative of the properties of the population.
The more representative of the population the sample is, the more generalizable are the findings of the
research. Recall that generalizability is one of the hallmarks of scientific research, as we read earlier.
Though in view of our concern about generalizability we may be particular about choosing
representative samples for most research, some cases may not call for such concern for
generalizability
23
Measurability, Data collection, Sampling
ETHICS IN DATA COLLECTION
For instance, at the exploratory stages of fact finding, we may be interested only in ―getting a handle on
the situation, and therefore limit the interview to only the most conveniently available people.
The same is true when time is of the essence, and urgency in getting information overrides in
priority a high level of accuracy.
For instance, a film agency might want to find out quickly the impact on the viewers of a newly
released film exhibited the previous evening.
The interviewer might question the first 20 people leaving the theater after seeing the film and obtain their
reactions.
On the basis of their replies, she may form an opinion as to the likely success of the film
24
Measurability, Data collection, Sampling
NONPROBABILITY SAMPLING
Convenience Sampling
Those inclined to take the test might form the sample for
the study of how many people prefer Pepsi over Coke or
product X to product Y.
25
Measurability, Data collection, Sampling
NONPROBABILITY SAMPLING
Purposive Sampling
26
Measurability, Data collection, Sampling
NONPROBABILITY SAMPLING
Judgment Sampling
27
Measurability, Data collection, Sampling
NONPROBABILITY SAMPLING
Quota Sampling
28
Measurability, Data collection, Sampling
THANK YOU