Learning Unit 8 - 10044701
Learning Unit 8 - 10044701
TABLE OF CONTENTS
2
Learning unit 8 | RSC2601
Data analysis and interpretation
8.1. INTRODUCTION
This learning unit introduces you to fundamental methods used to analyse and
interpret quantitative and qualitative data in the social sciences. The unit outlines basic
steps and techniques used to sum voluminous data into meaningful and easily
comprehendible information, emanating from a research study. In brief, we induct you
to comprehensive and essential quantitative data analysis techniques such as
descriptive statistics, frequency distribution tables, graphs, measures of central
tendency, and correlations between study variables. We also introduce you to
methods used to analyse and interpret qualitative data, which include, among others,
thematic analysis; constant comparative analysis; narrative analysis and
phenomenological analysis.
After the completion of this learning unit/lesson, you should be able to:
• explain the meaning of data analysis
• distinguish between quantitative and qualitative data analysis techniques
• interpret basic aspects of quantitative and qualitative data
• explain the role of descriptive statistics
• use frequency distribution tables and graphs to analyse your data
• distinguish between measures of central tendency — mean, median and mode
• distinguish between measures of variability — range, variance and standard
deviation
• explain the concept of correlation in research and interpret different relationship
patterns modelled on a scatter plot
• draw up a scatter plot based on a raw data set
• explain the concept qualitative data analysis
• describe the purpose of qualitative data analysis
3
Learning unit 8 | RSC2601
Data analysis and interpretation
• identify, define and explain the various strategies for analysing qualitative data
• describe a stepwise format or plan, with appropriate examples, to analyse the
data following the various data analysis
• implement the various qualitative data analysis strategies to analyse qualitative
data, with appropriate examples in a stepwise format
• explain how qualitative data can be interpreted
4
Learning unit 8 | RSC2601
Data analysis and interpretation
When conducting research, we often collect large volumes of information, called data.
The information collected is often unstructured, not coherent, logical or meaningful. In
other words, one cannot really make sense of it. To ensure that the data we collect is
meaningful and understandable, we first need to analyse or break it down in ways that
will make it comprehensible. This process is known as data analysis 1. Data analysis
refers to the process of organising the collected data, in some order or format, in order
to make meaning (Bhome et al., 2013) of the phenomenon under study. In the previous
study unit, you learnt about different ways in which we collect information (data). This
information is often presented in the form of numbers. For example: the number of
people in a survey who indicated that they have been victims of armed robbery; how
a sample of respondents rated a food product on a five-point scale (with higher scores
indicating a more positive rating); or how a group of students scored on a verbal
reasoning test. Also, a great deal of non-numerical data can be represented in a
numerical form. This involves coding or assigning certain numbers to the categories
of a variable. An example would be to code male as 1 and female as 2. Have you ever
completed a questionnaire where the code categories have already been placed on
the questionnaire? This means that, instead of just indicating that you are “male” or
1
Data analysis refers to the process of organising the information collected in a research study to make
meaning of the phenomenon under investigation.
5
Learning unit 8 | RSC2601
Data analysis and interpretation
“female”, you mark the 1 or the 2. Such a questionnaire has been pre-coded. The
reason for coding is that we need to transform our raw data into a format that can be
used in computer analyses.
We should keep in mind that data collection is not an end in itself, but forms part of a
research process aimed at answering a specific research question. In this study unit,
we will show you how you can organise numerical or quantitative data, to help you
meet this aim. When designing your research (i.e., before the data are collected), you
already need to consider what you are going to do with your data. This will ensure that
the data you collect can be analysed in a way that will provide answers to all your
research questions.
Activity 8.1
Before we discuss how to organise and analyse your data, we would like you to think
about examples from your own life (or from what family or friends have told you), where
you have provided information about yourself, whether it was information about your
feelings, attitudes, opinions, experiences, etcetera. Maybe, you completed a
questionnaire on your vocational interests at school or, perhaps, you were asked
which political party you would vote for in an election. Another example is the forms
we find in restaurants, shops, service stations, etcetera, for rating service quality. We
also often find questionnaires in magazines (e.g., on our attitude towards abortion or
our beliefs about Aids). All these are examples of data collection, since some
information was sought from you. Data collection is a process that precedes data
analysis in the steps of the research process.
2
Descriptive statistics refers to mathematical techniques used to see underlying patterns of data.
6
Learning unit 8 | RSC2601
Data analysis and interpretation
use this as evidence for his or her arguments and claims about the topic the researcher
investigated. Statistics are often used in both popular literature (magazines,
newspapers, etc.) and scientific articles to support an argument.
Activity 8.2
The Human Sciences Research Council (HSRC) often conducts surveys on South
African social attitudes. The surveys focus on a range of issues, such as poverty,
inequality, racial redress, and service delivery, among others. In 2010, the HSRC
published a report from a survey on South African Social Attitudes (HSRC, 2010). In
one of the questions in the survey, 5 583 South Africans were asked whether their
local council became more efficient at responding to their needs over the past five
years. Of the 5 583 participants, 29% responded “Yes”, 46% responded “No”, and
25% responded “Don’t know” to the question. The results, therefore, show that the
majority of South Africans responded “No” to this question. Do you agree with the
results of the survey that local councils did not become efficient at responding to the
needs of South Africans over the past 5 years, and what are you basing your answer
on? Figure 8.1 shows a pie chart based on participants’ responses to this question.
7
Learning unit 8 | RSC2601
Data analysis and interpretation
Don’t know
Yes
25%
29%
Yes
No
Don’t know
No
46%
Figure 8.1: Local council efficiency in responding to needs in the past 5 years
From the results presented in the pie chart, a majority of South Africans (46%)
indicated that local councils did not become more efficient at responding to their needs
in the past 5 years, 25% of South Africans responded that they did not know whether
local councils became more efficient at responding to their needs in the past 5 years,
while 29% of South Africans indicated that local councils became more efficient at
responding to their needs in the past 5 years.
In the social sciences, it is important to consider the methods which were used to
conduct the study, before accepting research results. In the results presented in the
pie chart, there were only 5 583 South Africans who took part in the survey (meaning,
not all the South African population took part in the survey, but a sample). Additionally,
different responses were provided, based on the participants’ experiences with their
local councils. This means that, when interpreting research results, the researcher
needs to consider contextual and related factors (research design, sampling strategy,
participants’ characteristics, etc.) that may have an influence on the results of the study
in order to make appropriate conclusions.
You have seen that the researcher has to provide sufficient information and that you
need to understand the procedures that were used, before you can interpret the
8
Learning unit 8 | RSC2601
Data analysis and interpretation
statistical results. We will now work systematically through the procedures used to
compile various descriptive statistics. Being able to interpret descriptive statistics helps
you to evaluate claims more carefully, rather than blindly accepting statistical data.
You also need to be able to apply descriptive statistics if you want to summarise and
report trends in your own data.
One way in which to summarise data, so that the overall pattern of the data becomes
clear, is to create a frequency distribution. Such a distribution indicates the number of
cases in a data set that obtained a particular score or that fall in a particular category
of a variable. Frequency distribution 3 is therefore the grouping of raw data. Suppose
we obtain scores on a colour awareness test for a sample of first-year engineering
students. Our data set consists of the scores for all the students in the sample. We
group this raw data (the scores) by indicating how many cases (referring to the number
of students or their scores) obtained a score of zero; how many obtained a score of 1;
etcetera. The number of cases is called the frequency of that score, or category, and
the symbol f is used to refer to frequency.
You will find that researchers usually do not include the column with the tally marks in
the final presentation of the frequency distribution table. The total frequency is written
in the third column and the sum of these frequencies (if you add them all up) should
be the same as the number of cases in the sample. The categories should be mutually
exclusive (a case cannot be classified in more than one category) and there should be
3
Frequency distribution refers to a table or graph indicating how observations are distributed.
9
Learning unit 8 | RSC2601
Data analysis and interpretation
sufficient categories so that every case can be classified into one of the available
categories.
We will use an imaginary study to illustrate tables and graphs. To make it easier for
you to understand the issues involved, we limit the number of cases in the sample.
Suppose a researcher does a study on aggression in adolescents. He/She obtained
the following information for a convenience sample consisting of 20 secondary school
students: gender (male or female) and scores (ranging from 0 to 40) on an aggression
questionnaire. Even though this is a small sample, the researcher finds it difficult to
form an overall impression of the raw data (see table 8.1). However, if he/she
organises the data according to gender (see table 8.2), they can immediately see that
more females than males were included in the study. Remember we said that this was
a convenience sample, which means that it is not necessarily representative of the
larger population. Can you see the advantage of descriptive statistics? Even though
you did not take part in the study, you can tell by looking at the table how gender was
distributed in the sample.
4
Grouped frequency table refers to a frequency distribution table with a limited number of categories.
10
Learning unit 8 | RSC2601
Data analysis and interpretation
TABLE 8.1
List of gender and aggression score
11
Learning unit 8 | RSC2601
Data analysis and interpretation
Consider the aggression scores in table 8.1. The highest value is 39 and the lowest
value is 8. If each score from 8 to 39 had to be a separate category, there would be
32 categories. This does not really help us to summarise the data. Table 8.3 is a
grouped frequency distribution of this data and you will see that the data are now
easier to interpret than the original list of aggression scores. It has been simplified and
you can contrast the number of students who obtained a low aggression score with
the number who obtained a high score. We can see that only one student obtained a
very low score (in the lowest interval), while five students obtained a relatively high
aggression score (35–41). Remember that some information is lost in a grouped
frequency distribution. For example, we can see that one person obtained a score
between 7 and 13, but we cannot infer the student’s exact score from the grouped
frequency table. One other thing that you should know about class intervals is that the
midpoint of the interval can be used to represent all the values in a particular interval.
For example, the midpoint of the interval 7–13 in table 8.3 is 10.
TABLE 8.3
Grouped frequency distribution table for aggression scores
Cumulative
Class interval Tally Frequency
frequency
35–41 |||| 5 20
28–34 |||| 4 15
21–27 | | | | ||| 8 1
14–20 || 2 3
7–13 | 1 1
n = 20
Sometimes, we are concerned not with the frequencies within the class intervals, but
with the number of scores (frequencies) “greater than” or “less than” a specified value.
The cumulative frequency (cf) 5 of a class interval is the number of cases in the
specified interval plus all the cases in the previous intervals. In other words, the
cumulative frequency (cf ) of a class interval is the number of cases that fall below the
lower limit of the next interval. For example, from the last column in table 8.3, we can
conclude that 15 students had a score lower than 35. Can you see that a cumulative
frequency distribution would not be very useful for nominal data such as the data in
5
Cumulative frequency refers to a number of scores below (or above) a certain value.
12
Learning unit 8 | RSC2601
Data analysis and interpretation
table 8.2? For cumulative frequencies to be meaningful, the order of the categories
should make sense. By the way, did you notice that the cumulative frequency of the
highest-class interval is equal to the total number of cases? Can you see why?
8.4.2.2. Percentages
We have already mentioned percentages in activity 8.2. The percentage of a category,
a score value or a class interval indicates what part of the whole sample of scores that
category, value or class interval represents. Percentage is determined by dividing the
frequency by the total number of cases (n) and then multiplying it by 100 (100%
represents the whole sample). In table 8.3, we presented the frequency and
cumulative frequencies of aggression scores. The distribution of percentages for the
same set of scores is given in table 8.4. Percentages are useful, because not only is
the number of persons in a specific category or class interval taken into account, but
so is the total number of persons in the sample. The class interval 21–27 has the
highest frequency of students (8 students) and this is therefore also the interval with
the highest percentage (40%). But, if our sample included 200 students, 8 students
would represent only 4% of the sample.
TABLE 8.4
Distribution of the percentages and cumulative percentages for aggression
scores
Activity 8.3
Calculate the cumulative percentages for aggression scores and complete table 8.4
(last column). What percentage of students had a score lower than 35?
13
Learning unit 8 | RSC2601
Data analysis and interpretation
In the previous section, we showed how tables can be used to represent frequency
distributions. The same data can also be presented graphically. An example is the pie
chart in figure 8.1 — that is one way of representing categorical data. An important
advantage of graphs is that they make it easier to obtain an overall impression of the
data: a graph gives you a “picture” of a set of scores. This section deals with bar charts,
histograms, and polygons. These graphs consist of a horizontal line, called the X axis
or abscissa, and a vertical line or Y axis, called the ordinate. These two lines meet at
an angle of 90 degrees. The categories or score values appear on the X axis and the
number of scores (frequencies) appear on the Y axis.
Suppose the data that we collected are measured on a nominal level of measurement;
in other words, if our measurements are in the form of categories (i.e., gender
measured as male or female; marital status measured as never married, married or
divorced etc.), we can use a bar chart6 to visualise the frequency distribution of the
data. Points on the X axis represent the categories. For each category, a bar is drawn
and the height of this bar (measured on the Y axis) indicates the frequency or number
of cases that fall within that category. Because the categories represent separate
classes, the bars in a bar chart are drawn in such a way that they do not touch each
other. Figure 8.2 is an example of a bar chart. This figure represents the distribution
of gender in the study of aggression in secondary school students and is based on the
same data as table 8.2.
6
Bar chart refers to a graph representing the frequency distribution of categorical data.
14
Learning unit 8 | RSC2601
Data analysis and interpretation
FIGURE 8.2
Bar chart for gender (n = 20 students)
Histograms are used to illustrate the frequency distribution of numerical data (data
measured on an interval or ratio level of measurement). A bar chart reflects discrete
data (e.g., data that can be counted such as the number of students in class, total
number of staff members, etc.), whereas a histogram 7 is used for continuous data
(e.g., data that can be quantifiable such temperature, weight, mass etc.). The scores
or the midpoint of each class interval are marked on the X axis and above each of this
a bar is drawn. The height of the bar, as measured on the Y axis, corresponds with
the frequency or the number of cases for that particular score or in that particular class
interval. The bars represent successive scores or class intervals and there are no
spaces between the bars. If we add up the frequencies represented by all the bars,
this will give us the total number of cases in our sample. The data in table 8.3 (class
intervals for aggression scores) have been visually presented in figure 8.3. This
histogram makes the differences and similarities between the various class intervals
apparent. For example, we can again see that only a small number of students
obtained a low score on the aggression questionnaire.
7
Histogram refers to a graph representing the frequency distribution of successive scores or class intervals.
15
Learning unit 8 | RSC2601
Data analysis and interpretation
FIGURE 8.3
Histogram for aggression scores (n = 20 students)
Rather than using bars to represent the frequencies, a mark which corresponds to the
score or the midpoint of each class interval can also be used. These marks (or
frequencies) are joined with straight lines, to draw a frequency polygon 8 that is
anchored on the X axis on both sides. In a histogram, we assume that all cases within
a class interval are uniformly distributed over the range of the interval, while in a
polygon we assume that the cases are concentrated at the midpoint of the interval.
Compare the polygon in figure 8.4 to the histogram in figure 8.3 and make sure that
you understand where the points in the polygon come from. A polygon can
accommodate more class intervals than a histogram. Smoothed polygons (the
midpoints are linked by curved lines) are frequently used to display the distribution of
scores for large data sets or populations.
The distributions of data differ in terms of central location (the middle point of the
distribution) and variation (the spread of the scores around the middle point). These
properties will be explained in sections 8.4 and 8.5.
8
Frequency polygons refers to a graph in which the frequencies of class intervals are connected by straight
lines.
16
Learning unit 8 | RSC2601
Data analysis and interpretation
Distributions also differ in skewness, that is, the symmetry or asymmetry of the
distribution. A distribution can be symmetrical — that is, it can have the same shape
on both sides of the middle point (i.e., the left and right sides are mirror images of each
other). If a distribution is asymmetrical and the larger frequencies are concentrated
towards the low end, it is said to be positively skewed (i.e., long tail towards the right
side). If the larger frequencies are concentrated toward the high end of the variable,
the distribution is negatively skewed (i.e., long tail towards the left side). Skewness is
illustrated in figure 8.5. Note that smooth curves are used. Whenever we deal with
large populations, we prefer to represent our frequency distributions as smooth curves.
We have already referred to this when we talked about smoothed frequency polygons.
FIGURE 8.4
Frequency polygon for aggression score (n = 20 students)
17
Learning unit 8 | RSC2601
Data analysis and interpretation
FIGURE 8.5
Three frequency distributions differing in skewness
FIGURE 8.6
We have seen that tables and graphs can be used to summarise data. It is also
possible to use single values to summarise the data obtained from a sample and to
describe the characteristics of the frequency distribution. Researchers often want to
know which score or value is central to a distribution and which can, therefore, be used
to summarise the entire distribution. A score or value which represents all the scores
in the sample is called a measure of central tendency. We will discuss three measures
of central tendency, namely the mode, the median and the mean.
18
Learning unit 8 | RSC2601
Data analysis and interpretation
If there are relatively few scores, it is easy to determine the mode, without using tables
or graphs. In the case of a large sample of scores, it might be easier to arrange the
scores in ascending or descending order or to work with frequency distributions. The
mode 9 is the score value with the highest frequency. For example, in the list, 23 26
28 37 37 37 45 48 49, the score that occurs with the highest frequency (three times)
is 37 and this is regarded as the mode. None of the other scores in this list occurs
more than once.
If two or more successive scores in a sample all have the highest frequency, the
average (this term will be explained later on in this section) of those scores is taken
as the mode of the distribution. However, if two values that do not follow on each other
both have the highest frequency, the sample has two modes. Such a distribution is
called bimodal (compared to a unimodal distribution with a single mode). If, for
example, the list that we gave you was, 23 26 28 37 37 37 45 48 49 49 49, there would
have been two score values that occurred three times and the distribution would be
bimodal.
If a distribution has two or more modes, these modes do not give a good indication of
the central tendency of the sample as a whole. In the case of a grouped frequency
distribution, the mode is equal to the midpoint of the class interval with the highest
frequency. A graphical representation of the distribution makes it easy to identify the
mode, since the class interval with the highest frequency will stand out above the
others. Take a look at the bar chart in figure 8.2. The mode in this example is the
category “female” and we therefore concluded that this was the largest category.
The mode is the only measure of central tendency that can meaningfully be used for
nominal data. If we are dealing with categories (e.g., different types of illnesses), it
does not make sense to order the types of illnesses and neither do the illnesses have
numerical values. Only the frequency of occurrence of each category is taken into
account when calculating the mode.
9
Mode refers to a score in a sample of scores that occurs with the greatest frequency.
19
Learning unit 8 | RSC2601
Data analysis and interpretation
To work out the median of a sample of scores, we first have to arrange the scores in
ascending or descending order. The median 10 is the value which falls right in the
middle of the list; in other words, half the scores in the sample fall below the median
and the other half above it. It is therefore the midmost score, that is, the score below
which 50% of all the scores fall. If the number of scores is an odd number, the median
is simply the score in the middle of the list. When the number of scores is an even
number, the middle of the list falls between two values and the median is the average
of these two scores. If several scores with the same numerical value occur near the
median (called tied scores), you will still use the position of the scores, after they have
been ordered, to determine the median. For example, in the list, 23 26 28 37 37 37 45
48 49, the score corresponding to the middle rank is 37 and this is regarded as the
median. In the case of a large sample, where the scores have been represented in a
frequency distribution, the median is calculated by means of a formula for grouped
data. This formula is also recommended in some cases where tied scores occur in a
list of scores, but these calculations do not form part of this module.
Both the mode and the median can be used with ordinal data, but the median is
preferred, because it takes into account the frequencies and the rank order of scores.
Suppose that the suburbs in the town or city where you live are ranked according to
density (the number of people living there). Low density is allocated the rank of 1,
average density 2, and high density 3. If ten suburbs are ranked 1, eight are ranked 2
and nine are ranked 3, then the mode 1 indicates the category with the highest
frequency. However, we cannot necessarily conclude that most of the suburbs were
low in density. If the set of ranks for all the suburbs are arranged in ascending order
(first all the ones, then all the twos, etc.), the middle value in this set (the median)
would be 2. At least half of the sample or 50% of the suburbs are therefore average
or high density (the scores in our list that fall above the median).
The mean11 of a sample of scores is the arithmetic midpoint of the scores and
represents all the scores in the sample. To calculate the mean, we add up all the
10
Median is a value or score such that half the observations fall above it and half below it.
11
Mean refers to a sum of a sample of scores divided by the number of scores in the sample.
20
Learning unit 8 | RSC2601
Data analysis and interpretation
scores and divide it by the total number of scores in the sample. We use the symbol x
to refer to the raw scores in the distribution of the variable x. As we already know, the
symbol n stands for the number of scores in the distribution.
The n measurements in a sample of scores are thus represented by the symbols, x1,
x2, x3, ..., xn. The formula for the mean is,
x̄ = x1 x2 x3 + ... + xn
n
and this can also be written as:
x̄= ∑x
n
In this formula x̄ (pronounced x-bar) is the mean, ∑ means summate (or add up), x is
each raw score, and n is the sample size (the number of people in the sample).
Everything above the line (i.e., the sum of all scores) should be divided by everything
below the line (i.e., the number of scores in the sample). You are not expected to
memorise the formula, but being able to calculate the various statistics gives you a
better understanding of these statistics.
It is also possible to calculate the mean by using a frequency distribution. This might
be necessary if we are working with a large sample of scores. Each value of the
variable x is multiplied by the number of times it occurred (the frequency) and these
products are added together and divided by the total number of measurements. In the
case of a grouped frequency distribution, the midpoint of each interval may be used to
represent all values falling within the interval.
All three measures of central tendency can be used in the case of interval and ratio
data, but the mean is usually chosen. When calculating the median, the particular
values of the variable are not taken into account, but only the occurrence of the values
above or below the middle value. Two studies on stress in executives were conducted
in different organisations and the following scores were obtained (the maximum
possible score on the stress questionnaire is 60). Study 1: 9 11 17 20 23 25 28; Study
21
Learning unit 8 | RSC2601
Data analysis and interpretation
2: 11 14 18 20 48 52 54. These samples of scores have the same median, but this
does not indicate that some of the executives in the second organisation have high
levels of stress that could influence their ability to do their job. The mean for these two
samples of scores would be 19 and 31, respectively, indicating that, in the case of the
second organisation, it might be necessary to pay more attention to the stress levels
of executives.
Because all the values of the variable are used to calculate the mean, this is a more
appropriate measure of central tendency for interval and ratio data. The mean can be
used in mathematical calculations, whereas the mode and the median cannot. The
mean is also a more accurate and stable estimate of the population mean than the
other measures of central tendency. However, if there are one or two scores that differ
a great deal from the rest of the scores, this will influence the mean and the median is
then preferred. Remember, we called such a distribution skewed (refer to figure 8.5).
The mean, mode, and median of a symmetrical frequency distribution will coincide.
Activity 8.4
When doing research, you will often have to decide what is an appropriate method to
present a particular data set. Which measure of central tendency do you think will be
best at showing household income in South Africa, and what is the rationale for your
answer?
In answering this question, you need to think about what the central measures of
tendency symbolise in practice. We have said that the mean is an appropriate
measure of central tendency for interval and ratio data; you might think that the mean
will best reflect household income. However, in most countries (including South Africa)
household income is positively skewed, meaning that far more people earn less money
than the mean, rather than the other way round (i.e., earn more than the mean).
Because there are a few extremely rich people, their income makes the mean higher
than most people’s income. The median is therefore a better indication of average
income in a country.
22
Learning unit 8 | RSC2601
Data analysis and interpretation
You now know that a sample of scores can be summarised and described by using
central values, such as the mode, median and mean. These values represent all the
scores in a sample. However, these central values do not indicate the extent to which
the scores in the sample differ from each other and how far they deviate from the
central value. The degree to which scores in a sample differ, that is, how spread out
they are, is called the variability of the scores12.
The simplest measure of variability is the range. In any sample of scores, the range is
taken as the difference between the highest and lowest scores. The range is a
measure of variability of scores in a sample, because it indicates the range of the
distribution of scores from the lowest to the highest. In the example, we used on stress
in executives, the range for Study 1 is 19 (28 [the highest score] minus 9 [the lowest
score]), and for Study 2 it is 43 (54 minus 1). The scores in the second study clearly
exhibit greater variability (they are more scattered) than those in the first study. This
set of scores, consequently, has a much greater range and, again, this is a warning
that some of the executives in the second company are experiencing much more
stress than others (do you remember that this organisation also had the higher mean
stress score?).
The mean is the index of central tendency that best represents all the scores in the
sample. If we determine the extent to which each score in the sample differs from the
12
Variability of the scores refers to the extent to which scores differ from each other or how spread a group of
scores are in a frequency distribution.
23
Learning unit 8 | RSC2601
Data analysis and interpretation
mean, then we have an indication of the extent to which all the scores differ from each
other (the variability). We could therefore determine variability by subtracting the mean
of the sample from each raw score in the sample. We call this difference a deviation
score (represented by x – x̄ for each value of x). This score indicates the extent to
which each raw score deviates from the mean. To determine variability, we could add
up the deviation scores, but some of the deviations about the mean are positive and
some are negative — this means that the sum of deviations is therefore zero. The sum
of the deviation scores below the mean cancels out the sum of the deviation scores
above the mean. One method for getting rid of the negative values obtained from the
deviations below the mean is to square the deviations from the mean before we add
them up. This means to multiply each deviation with itself. The variance 13 of a sample
of scores is calculated by dividing the sum of the squared deviation scores by the
number of scores to obtain an average of the squared deviation scores. The formula
for this statistic is,
s² = Σ(x – x̄)²
n–1
In this formula, s² is the variance, ∑ means to sum, x is each raw score, x̄ is the mean,
and n is the sample size. Note that we divide the sum of the squared deviations by n–
1, instead of n, in order to obtain the “mean”. If we are working with a sample of scores,
the sample variance is an estimator of the population variance; a more accurate
estimate is obtained when n–1 is used as divider. The explanation for this forms part
of inferential statistics, but this is not covered in this module.
The variance is a statistic in squared units. However, we would like to interpret the
meaning of the variability of a set of scores in terms of the original units of
measurement. We therefore calculate the square root of the variance and this is known
as the standard deviation 14 of a sample of scores:
13
Variance is a measure of variability based on the deviation of each score in a distribution from the mean of
that
distribution
Standard deviation is an index of variability that is expressed in the same units as the original
14
measures.
24
Learning unit 8 | RSC2601
Data analysis and interpretation
s = √¯s²
Both the variance and the standard deviation of a sample of scores indicate the
average extent to which scores in a distribution differ from one another. Because the
standard deviation is expressed in the same units as the original measure, researchers
prefer to use this statistic.
Activity 8.5
Suppose you are doing a study on social support for prisoners and their families. One
of the variables you are interested in is the number of years people spend in prison
(the term of imprisonment). On average, prisoners spend four years in prison and the
variation in the number of years spent in prison is indicated by the standard deviation
of three years. Explain why standard deviation is a better index of variability than
variance in this study.
8.4.5. Relationships
Until now, we have been discussing a single variable. In previous study units, however,
research studies have referred to two or more variables and the relationship between
these variables. We will briefly consider the direction and strength of the relationship
between two variables. If there is a relationship between two variables, it means that
a person’s position on one variable is related to his or her position on the other
variable.
A direct or positive relationship means that relatively high scores on one variable are
associated with relatively high scores on the other and relatively low scores on the first
correspond with relatively low scores on the second. An inverse or negative
relationship means that high scores on one variable correspond with low scores on
the other variable. If the variables are not related, changes on the one variable do not
correspond with changes on the other.
We refer to the statistical relationship between two variables as a correlation and the
statistic used to describe this is called a correlation coefficient. It can range in value
25
Learning unit 8 | RSC2601
Data analysis and interpretation
from –1,00 to +1,00. These values represent a perfect negative (–1) or a perfect
positive correlation (+1). A value close to 0 indicates a weak relationship, while 0
means there is no relationship. We can see that the numerical size of a correlation
coefficient indicates the strength of the relationship, while the sign (positive/negative)
indicates the direction of the relationship. A positive correlation means that an increase
in one variable is associated with an increase in the other. A negative correlation
between two variables means that as the value of one variable increases, the value of
the other one decreases. Please note that the correlation between two variables does
not necessarily mean that one variable causes the other.
In correlational research studies, the researcher measures two or more variables for
everyone in the sample. The set of scores from the variables that the researcher is
interested in is correlated to establish if there is a relationship between the variables.
For example, a researcher may be interested in examining the relationship between
variable X (happiness) and Y (academic performance) in a sample of six students.
This means that the six students would have to be measured on variables X
(happiness) and Y (academic performance). The scores obtained from each student
participant are used to determine whether a relationship exists between happiness
and academic performance. Table 8.5 shows hypothetical scores obtained by a
researcher who was interested in determining whether there is a relationship between
happiness and academic performance among university students.
15
Correlation is the examination of the relationship between two or more variables.
26
Learning unit 8 | RSC2601
Data analysis and interpretation
Table 8.5
Happiness and academic performance scores
Student participant A B C D E F
Happiness (X) 3 6 8 13 16 20
Academic performance (Y) 4 6 10 14 18 24
The observation made from the results, presented in table 8.5, is that low X scores are
associated with low Y scores and high X scores are associated with high Y scores. In
practice, this means that as happiness increases, the academic performance also
increases and, as happiness decreases, academic performance also decreases. This
indicates a positive relationship between happiness and academic performance.
Contrary, a negative relationship would have been observed if high scores in
happiness were associated with low scores on academic performance or high scores
on academic performance associated with low scores in happiness.
In research, we may use graphical representations to present our results. One such
graphical representation is a scatter plot, which can be used to model the relationship
between variables. A scatter plot 16 allows for individual scores to be represented
graphically and to demonstrate how variables relate to one another. In other words,
the scatter plot accommodates the X and Y scores from each individual participant
and maps them out, to identify a relationship between two variables. As an example,
we use the scores from table 8.5 to draw a scatter plot depicting the relationship
between happiness and academic performance in figure 8.7.
16
Scatter plot: a graphical representation modelling the relationship between two variables.
27
Learning unit 8 | RSC2601
Data analysis and interpretation
30
25
F
10 C
B
5
A
0
0 5 10 15 20 25
Happiness (X)
Figure 8.7: Scatter plot demonstrating the relationship between happiness and
academic performance
The scatter plot demonstrates the X and Y scores from student participants, A, B C,
D, E, and F. The scatter plot also indicates a positive relationship between happiness
and academic performance, as shown in the dotted line, with a slope from the bottom
left to the top right of the graph. This means that an increase in happiness is associated
with an increase in academic performance. If, however, there was an increase in
academic performance scores and a decrease in happiness scores, a negative
relationship would be observed. Therefore, the dotted line would have a slope crossing
from the top left to the bottom right of the graph. In the social sciences, researchers
seldom find a strong relationship between variables that nearly resembles a straight
line, such as the one depicted in figure 8.7. A relationship between variables,
expressed through a straight line, is called a linear relationship17. A straight line
slope, crossing from the top left to the bottom right, indicates a negative linear
relationship, whereas a straight line slope, crossing from the bottom left to the top right,
indicates a positive linear relationship. The graphical representations of linear
relationships are illustrated in figure 8.9.
17
Linear relationship is a positive relationship between two variables modelled by a straight line.
28
Learning unit 8 | RSC2601
Data analysis and interpretation
30
30
25
25
20
20
15
15
10
10
5
5
0
0
0 10 20 30
-30 -25 -20 -15 -10 -5 0
(a) (b)
Figure 8.9: Scatter plot (a) depicts a negative linear relationship with a
correlation coefficient of (-1) and scatter plot (b) depicts a positive linear
relationship with a correlation coefficient of (+1)
Activity 8.6
Mr Stan, a high school principal, gave a report to the members of the School Governing
Body (SGB) that there was a sharp increase in the number of bullying cases reported
to his office in the previous year. As a result, learners’ marks for the previous year also
dropped significantly. As a researcher, how would you characterise the relationship
between the bullying cases reported and the drop in learners’ marks?
29
Learning unit 8 | RSC2601
Data analysis and interpretation
• The methods and procedures used in qualitative research are flexible and
responsive to the research findings as they emerge.
• Qualitative researchers collect relatively unstructured data to describe the
phenomena under investigation, based on words or conducts of the
participants.
• The interest of qualitative researchers is not narrowly limited to the phenomena
of interests. They also pay attention to the natural contexts in which such
phenomena occur, as well as their decisions, as the study progresses.
• The study is so broad to an extent that it includes an analysis of both the
subjective experiences of the researchers and the participants.
• Qualitative researchers often recommend the location of a qualitative study
within a particular epistemological tradition, namely post-positivism,
pragmatism, phenomenology, interpretivism or constructivism, and critical,
normative science.
30
Learning unit 8 | RSC2601
Data analysis and interpretation
Now that we know what qualitative research is all about, we are shifting our focus to
the main business of this section of our study unit, which is qualitative data analysis.
Before we get deeper into the focus of this section, it is essential to set a tone by
borrowing from Flick’s (2014) emphasis on the significance of data analysis:
Flick’s emphasis is that, without data analysis, researchers will never realise the
study’s desired outcomes. This emphasis is particularly crucial in view of Lloyd’s
(2014) argument that data analysis does not only happen after the data collection, as
many would like to believe. It is rather a process which is conducted in two phases:
firstly, during the early stages of preliminary literature review, researchers analyse and
evaluate literature with the purpose of understanding the field, their specific research
topics and identifying any existing gaps in literature. Secondly, researchers conduct
data analysis of the data that has been collected for their research project (Lloyd,
2021).
This then takes us to the next essential point regarding data analysis. It is essential to
remind you that qualitative research studies are conducted for various purposes,
including to answer a particular research question through the data, which is either
collected by reviewing existing literature or documents, observing people as they
engage in their normal activities, or interviewing people either individually or in a group
discussion, or even by analysing pictures and sketches. The material that has been
collected from such literature, documents, interviews, group discussions or pictures is
called the data and, once collected, it must be analysed to ascertain its meaning in the
context of the research purpose and/or questions. Given the various forms of
qualitative data collections, qualitative data can also be in various forms. According to
31
Learning unit 8 | RSC2601
Data analysis and interpretation
Igatu (2009), qualitative data can be in a form of a structured text (writings, stories,
survey comments, news articles, books etc.); unstructured text (transcription;
interviews; focus groups; conversations); or audio recordings, music and video
recordings (graphics, art, pictures, visuals).
Before we take you through the actual process of qualitative data analysis, it is
important to first help you to understand the meaning of data qualitative analysis.
There is no universally accepted definition of the term, qualitative data analysis;
different definitions of the term have been developed by researchers and authors.
Fram (2013), for instance, defines qualitative data analysis as a variety of practices
and procedures through which researchers move from the raw qualitative data that
have been collected, to some kind of explanation for easy understanding or
interpretation of the meanings and situations of people who are part of the
investigations. In another definition, Mezmir (2020) refers to qualitative data analysis
as a process through which researchers classify and interpret linguistic (or visual)
material, to make statements about implicit and explicit dimensions and structures,
through which meanings are created in the material and its representations. Moule
(2021), for instance, considers data analysis as an act of processing, summarising
and interpreting raw data into meaningful information. By processing, Ibrahim (2015)
refers to the recasting and dealing with data in such a way that it is ready for analysis.
For Ibrahim (2015), data analysis entails closely related operations, performed with
the purpose of summarising the collected data and organising it in such a way that it
yields answers to the research questions. A more comprehensive definition of
qualitative data analysis is the one provided by Flick (2014), who defines it as follows:
32
Learning unit 8 | RSC2601
Data analysis and interpretation
If you pay closer attention to these definitions, you will notice that, despite having been
coined by different authors and researchers, they still share common purpose, which
is to systematically reduce volumes of data into small manageable ones for the
purpose of answering the research questions. Qualitative data analysis involves
collecting and patching together all pieces of data, to develop a broader understanding
of its meaning.
The overall purpose of qualitative data analysis is to develop structure and meaning
from the collected data (Lloyd, 2021). You will remember that the data, which has been
collected through interviews or group discussions, for instance, will be voluminous and
will not necessarily be structured. It will generally be difficult, if not impossible, to derive
any meaning from such kind of data. Therefore, data analysis will assist in structuring
it, so that it can ultimately have meaning. After all, collecting such material, without
interrogating and interpreting it in line with the study purpose or questions, would
render the entire process futile.
Three other aims of qualitative data analysis, as identified by Flick (2014), are (1) to
describe the phenomenon under investigation in greater detail; (2) to identify the
conditions on which the differences and commonalities between the cases (the
individuals or groups under investigation) can be derived; and (3) to develop theory
from the phenomenon under investigation by analysing empirical material. For Mezmir
(2020), qualitative data analysis serves three main aims, namely
(a) to describe the phenomenon in greater detail. This can for instance take a form
of explaining subjective lived experiences of the participants.
(b) to explore the conditions on which existing differences are based, by looking
for explanations of the observed differences.
(c) to develop theory of the phenomenon under investigation, from analysis of the
empirical material.
33
Learning unit 8 | RSC2601
Data analysis and interpretation
Suppose the researcher asked the question: “What is the purpose of using headsets
while studying?” In view of the various forms of data collected through various
methods of data collection, such a question may clearly require the researcher to get
students to participate in an interview setting, to answer a set of questions, the
answers of which will ultimately answer this research question. Alternatively, the
researcher might simply visit libraries, where students are studying, and observe them
as they study to see how they manage to do so with headsets on their ears. The
researcher may also watch videos or simply read through literature around the subject.
Whatever method the researcher chooses, they will ultimately collect voluminous data,
which should be analysed. Such data will consist of different views expressed by the
students or different notes from observations, depending on the method used to collect
the data, some of which may not make any sense when considered in isolation.
He/She will have to identify some patterns from such voluminous data and connect
them to develop meaning in the context of the posed questions. Such an exercise is
what data analysis seeks to achieve. Data analysis is more like building a puzzle out
of many pieces of different patterns and colours. The volume of the data that will be
analysed can be compared to assorted pieces of a puzzle, which are patched
together, to create the bigger picture that, ultimately, makes sense. For the bigger
picture to appear accurate, relevant pieces and correct colours must be correctly
positioned in their respective spaces. In such a way, the reader can ultimately get an
understanding of what the pieces are and how they connect to create the bigger
picture. Let’s take a moment and investigate another example. Suppose a researcher
conducts a qualitative study, seeking to understand what the participants’ experiences
were regarding a giant animal which has recently escaped from the Kruger National
Park in the middle of the night and is now roaming around in the communities. A
qualitative researcher might, in this instance, ask the participants the question: “How
would you describe your experiences during that night”?
34
Learning unit 8 | RSC2601
Data analysis and interpretation
animal. As part of analysis, the researcher should, ultimately, connect all those pieces
of description, so that it can make sense to the reader. Some of the pieces may not
really be useful, depending on what you want to achieve (your research aims and
questions). Figure 9.1 below illustrates data analysis as a metaphoric puzzle, whereby
each of the pieces are connected to create an image that makes sense, that is,
comprehensive understanding.
From this figure, you can see that each piece of the puzzle has a specific role to play
in each respective space, both in giving colour and shape to the bigger structure. Even
in data analysis, each piece of material has a role to play in giving meaning to the data,
in the context of the posed research question(s).
35
Learning unit 8 | RSC2601
Data analysis and interpretation
Activity 8.7
Take a moment and look at the following picture and try to answer the questions
that follow.
Looking at the four individuals, what do you think they are doing? What do you think
about the stickers placed on the wall? What could be written thereon? In your view,
what is the whole exercise about?
In your attempt to answer the above questions, you would have noticed that this
exercise required subjective interpretation of the picture in the context of the subject
line of this study unit, which is data analysis. The first question required you to share
your thoughts regarding what the people on the picture are doing. Given the context
of the study, one might say they are manually engaged in data analysis. You can
assume this, because you see each one trying to sort the stickers or placing them in
a certain order. It could be that they are clustering common data materials together.
The next question required you to share your thoughts on contents of the stickers.
What do you think is on the stickers? It could be some data codes that were developed
from the raw data, which, as we explained earlier in our discussion, is more
36
Learning unit 8 | RSC2601
Data analysis and interpretation
comprehensive and voluminous. The stickers could have some labels or codes that
are used to identify certain data sets, with the eventual aim of creating meanings from
it. The entire exercise seems to be illustrative of a manual process for qualitative data
analysis.
Now that you can locate qualitative data analysis within the broader context of
qualitative research, and to even define the concept of data analysis and explain its
purpose, it is essential to build further on this knowledge by practically analysing and
interpreting qualitative data. It is the purpose of this section to further capacitate you
in this regard, through lessons on how to analyse qualitative data.
In setting the tone for data analysis, it is essential to begin with the insights penned by
Lloyd (2021), as quoted below:
Good analysis requires strong inductive analytical skills and good deal of creativity
(making connections across the data) in order to identify patterns and weave these
together in a meaningful and insightful way. Reporting qualitative analysis is also
tricky, because of the concise and structured way reporting is conducted;
subsequently, researchers need to make decisions about which aspects to include
and what to leave aside.
The above extract captures the essence of qualitative analysis, interpretation and
reporting. From what Lloyd is saying, one gets a sense that researchers are confronted
with comprehensive data, from which some patterns should be created, through
linkages, to develop meaningful understanding. During this process, researchers are
confronted with a huge challenge, which requires a very strong analytical mind, with
the capacity to sort relevant data from that which is irrelevant, as they strive to create
meaningful knowledge. Although qualitative researchers prefer manually analysing
such voluminous data, technological advancement also play a crucial role in qualitative
data analysis, with software programmes such as NVIVO and AtlasTi, among others,
being the common strategies (Lloyd, 2021). For the purpose of this module, we will
focus on manual analysis. This, however, does not mean that technologically powered
analysis is discredited or not so important. No. The main reason is that we belief that
for you to have a well-grounded basic knowledge, it is essential to begin with the
37
Learning unit 8 | RSC2601
Data analysis and interpretation
manual approach to data analysis. You will learn further about technologically driven
analysis as you advance your studies.
As much as there are diverse definitions of the concept data analysis, there are also
several strategies for analysing data (Morse, 2020; Roulston, 2022), which include the
following: constant comparative analysis; phenomenological analysis; conversation
analysis; video analysis; content analysis; electronic analysis; narrative analysis and
discourse analysis (Maxwell & Chmiel, 2014; Mezmir, 2020; Roulston, 2022). Given
the scope of this section, we will not address each of these approaches beyond merely
describing them. You will learn more about them further in your studies in social
science research. Below, we provide only a brief explanation of each of the analysis
strategies. Our focus will be on thematic analysis, which is also explained further
below:
38
Learning unit 8 | RSC2601
Data analysis and interpretation
Narrative analysis is a type of analysis focusing on one person’s life, as told through
many interviews and interactions in the field. The focus is on features such as
gestures, sounds and the dynamics around their speech acts, with the ultimate
purpose of understanding their biographical stories (Katz-Buonincontro, 2022).
Another definition of narrative analysis is the one provided by Ntinda (2018), who holds
that narrative analysis refers to several procedures used to interpret the narratives
generated through research. Two forms of narrative analysis are formal structural
analysis and functional analysis. Formal structural analysis refers to a form of analysis
involving the exploration of how the story is structured, developed and its beginning
and ending. Functional analysis, on the other hand, entails the functionality of analysis:
what the narrative is doing or what the participant is conveying through the story
(Ntinda, 2018).
39
Learning unit 8 | RSC2601
Data analysis and interpretation
40
Learning unit 8 | RSC2601
Data analysis and interpretation
In analysing content, researchers often adopt the two levels approach: the descriptive
approach and the interpretative approach. A descriptive approach to content analysis
involves a description of the data, while an interpretative approach focuses on the
meaning of such data (Nigatu, 2009).
Braun and Clarke’s (2006) approach to data analysis involves a six-stage process as
outlined below:
41
Learning unit 8 | RSC2601
Data analysis and interpretation
• Step 2: Coding: This is the generation of pithy labels for important data
features relevant to the research question guiding the analysis. Through
coding, researchers also capture both a semantic and conceptual
reading of the data. They code every data item, and collate all codes and
relevant data extracts.
• Step 3: Searching for themes: Searching for themes is a bit like coding
codes to identify similarity in the data. This ‘searching’ is an active
process; themes are constructed by the researcher by collating all the
coded data relevant to each theme.
• Step 4: Reviewing themes: This involves checking that the themes ‘work’
in relation to both the coded extracts and the full data set. The researcher
should reflect on whether the themes tell a convincing and compelling
story about the data, and begin to define each individual theme, and the
relationship between the themes. Some themes may be collapsed
together or split into two or more themes, or even be discarded
altogether, to rebegin the process of theme development.
42
Learning unit 8 | RSC2601
Data analysis and interpretation
• Step 6: Writing up: This means telling the reader a coherent and
persuasive story about the data, by weaving together the analytic
narrative and (vivid) data in relation to existing literature.
For Erlingsson and Brysiewicz (2017), data analysis evolves through a four-staged
process as outlined below:
• Step 2: Dividing the text into meaning units and condense them
Once the researcher is familiar with the data and hermeneutic spiral, they
will then divide the text into meaningful units and begin to condense them.
43
Learning unit 8 | RSC2601
Data analysis and interpretation
• Step 2: Identifying the framework. During this stage, the researcher reads
data repeatedly and identifies a framework, guided by either the
explanatory or exploratory design. The identified framework, which is a
coding plan, will then structure, label and define the data.
• Step 3: Sort data into framework. During this stage of analysis, the
researcher will code the data by modifying the framework.
In the preceding presentation, we outlined the data analysis steps from various
authors. You would have noticed that some of the stages are common in the process
proposed by all authors. These commonalities confirm what was noticed by Braun and
Clarke (2006), that the stages of qualitative data analysis are not necessarily unique,
even to thematic analysis itself. In other words, some of the stages may also be found
to be similar, for instance, to discourse analysis. Familiarisation with the data, for
instance, is one example of these commonalities which is found in both Braun and
Clark’s (2006) and Erlingsson and Brysiewicz’s (2017) approaches. Due to the
purpose and scope of this learning unit, we will only go deeper into one of the above
methods, to practically demonstrate how the data analysis is conducted. For such
purposes, we will focus on Braun and Clarke’s approach.
44
Learning unit 8 | RSC2601
Data analysis and interpretation
Before you begin to familiarise yourself with the data, you need to ensure that such
data is properly prepared and readable. The data that you will be analysing may either
have been given to you by the research assistants or a team of your field workers. It
may either be in a form of verbal interviews (audio recordings), or speeches or in the
form of text on documents. Data, which is not in text form, will have to be transcribed
into text (Braun & Clarke, 2006). To transcribe means to transform spoken language
into text (Marying, (2022). Reading box 9.1, below, is an example of a transcription.
(Please note: The name of the participant is replaced with a code (P-1) in order to ensure her
anonymity, as required by the research ethical principles. The numbers used next to the
alphabetical code refer to the line on the page where the remarks were found.
B1 Researcher: Tell me about your experiences and where does that come from?
B2 Why are you caring for people?
B3 P-1: I think when you’re young and you’re trying to find yourself, many different things, and I
found, B4 I lived in Laudium, I worked there. I lived in Lotus for a little while, not very long. And I
found in the B5 areas that I lived in, the first area I was in was Laudium; I chose it because it’s
very quiet. And we B6 were a very open family, very open to discussions and stuff, and then I
found that there was stuff B7 they were doing there that really you don’t do. So, in that sense I
used to be this figure of talking
B8 and always telling friends we don’t do it that way; let’s try. And from there I moved to Lotus
and
B9 there I found a lot of shebeens and drug abuse. I found many children where the mother or
the
B10 father was not there, and I found myself lost.
B13 Researcher: Okay, tell me about your...?
If you have collected the data by yourself, you would, of course, have some kind of
prior knowledge or little familiarity with it. This, however, does not mean you can
immediately begin analysing without further familiarising yourself with it. Braun and
Clarke (2006) propose that you be immersed with the data to an extent that you are
familiar with its breadth and depth. You should do so by actively and repeatedly
reading through the data, while searching for meanings, patterns and so on.
45
Learning unit 8 | RSC2601
Data analysis and interpretation
Now that you have familiarised yourself with the data by repeatedly reading through it,
you will then begin to produce some codes18. As defined by Marying (2022), coding
refers to the inductive or deductive process of identifying categories in the text. Coding
is determined by whether the researcher follows a data-driven approach (inductive
approach) or a theory-driven approach (deductive approach) (Braun & Clarke, 2006).
A data-driven approach to qualitative data analysis involves developing themes purely
from data set, while the theory-driven approach involves developing themes based on
some prior set of ideas or questions around which such themes should be based. In
coding, researchers work systematically through each data set, writing notes on the
text, highlighting potential patterns for easy identification of data segments. You will
do this throughout the data set and then collate similar codes. Now having gained an
insight on the second step, complete activity 8.8 below.
18
A code is the result or a product of coding.
46
Learning unit 8 | RSC2601
Data analysis and interpretation
Pay a closer attention to the picture. Looking at the four individuals, what do you
think they are doing? What do you think about the stickers on the wall? What could
be written thereon? In your view, what is the whole exercise about?
Reading box 9.2, below, serves as a demonstration of how the coding process is done.
This is an interview extracted from a transcript on the experiences that home-based
caregivers, who are caring for people living with HIV, encounter as they perform their
duties.
47
Learning unit 8 | RSC2601
Data analysis and interpretation
N36 Researcher: And what is it that you hate about being a caregiver.
N37 Ms N: What I hate is when we walk and as you are still knocking at a particular household you
get N38 words such as “no we don’t have a patient here” even before you greet and introduce
yourself.
N39 Researcher: How do they know that you are there for the patient?
N40 Ms N: These people talk, once we leave the households friends and neighbours would sneak
in N40 and say, “but why did you allow these people to come into your house, don’t you know that
these N41 people work with AIDS.” One of the patients’ mother had to tell us that her neighbours
asked her N42 why did she allow us in because we are working with AIDS. And what we would tell
them is that we N43 do not work only with AIDS. We work with all patients. She must also call us if
she has a patient N44 because sometimes you would find those people who has stroke left alone
in the house, so who N45 will look after that patient, no one. Once we get there we must bath
him/her, help him/her do some
N46 exercise and feed him/her.
N47 Researcher: And how does that feel when you are not welcome in the houses?
N48 Ms N: It is painful. As you leave that household you will feel discouraged although you would
N49 console yourself that you are here to work and people are not the same. So we would not
loose N50 courage, we would go to the next house because we are here to work and we are here
to help the N51 community. We don’t care about those who don’t want us, one day they will need
us. Many of them N52 used to chase us away from their houses but eventually they would come
referred to our offices N53 seeking help by those who were our patients before. When you get there
you realise that you were N54 once informed about that patient but could not assist because you
were chased away. As you leave N55 a household where you are chased away, you would feel
pain because you would be thinking about N56 that patient who is hidden, without food, not bathed
and often left alone. It is very painful because N57 we are there for such kinds of patients.
A quick reflection on reading box 9.2. From the highlighted data sets, some patterns
can be noticed. The orange highlights, for instance, reflects the words expressed by
the participant in relation to the bad treatment that they were receiving as caregivers
when they visit the households. A common pattern here is that they were not welcome.
Looking at the green pattern, you will notice some kind of coping strategy, which takes
the form of being resilient and continuing to do the work that they are called to do,
despite all the negativity they are confronted with. Moving on to the yellow highlights,
one notices the pattern that emerges is a description of their main duties as caregivers.
Regarding the red highlights, one gets a sense of the pain associated with the
knowledge of the hardships that the patients are going through. This is an example of
a typical coding process that researchers conduct as part of the analysis process. It
will be expected of you to do the same with all of your data sets, searching through
48
Learning unit 8 | RSC2601
Data analysis and interpretation
patterns, line by line, until all of your entire data sets have been examined. The codes
that may be extracted from the above exercise could be:
Remember, the process is not cast in stone. Different researchers can come up with
different codes, based on the same data. Now that you have familiarised yourself with
the data and developed the codes, the next step is to search for themes.
During this stage, analysis become expanded: from a narrowly focused coding to a
broader level of themes. It involves sorting through the different codes, to identify
potential themes and collating all applicable data within each of the relevant themes.
Braun and Clarke (2006) recommend the use of visual presentations, such as tables,
mind-maps or piling and organising pieces of paper into theme piles (just like you saw
in figure 8.2 above). During this process of sorting the codes into themes, it is possible
that some of the codes may be retained as themes, while others may become
subthemes.
49
Learning unit 8 | RSC2601
Data analysis and interpretation
As we continue with the extract from reading box 9.2, one might, based on the list of four codes that
were created, begin to search for a theme. From the two codes: ‘Reception by members of the
patient’s family’ and ‘Painful experience based on patients’ conditions’, the researcher may
decide to cluster them together under one theme, which is ‘the negative experiences of caregivers
when interacting with members of the patients’ families. Under this theme, the two subthemes
could be ‘negative reception by family members of the patients’ and ‘the impact of patients’
conditions on caregivers’ emotional state’.
The same exercise may be followed with the remaining two codes: ‘coping strategies’ and ‘main
duties of caregivers’, which may be elevated to the state of being themes by themselves. Upon
searching through your data set, you might come up with the following themes and subthemes:
Theme 1: The negative experiences of caregivers when interacting with members of the
patients
Once we leave the households friends and neighbours would sneak in and say, “but why did you
allow these people to come into your house, don’t you know that these people work with AIDS.” One
of the patients’ mother had to tell us that her neighbours asked her why she allowed us in because
we are working with AIDS.
Once we get there, we must bath him/her, help him/her do some exercise and feed him/her.
Remember, this is just an example. In a real situation, your themes and subthemes
will be supported by voluminous extracts from the interviews. Also be reminded that
themes that are developed at this stage are not conclusive, they are what Braun and
Clarke (2006:20) call “candidate themes” until properly reviewed, which is the purpose
of the next stage.
50
Learning unit 8 | RSC2601
Data analysis and interpretation
As the researcher engages themselves in the review process, they might find that what
was considered to be candidate themes are actually not themes and should therefore
be collapsed into each other or broken down into separate themes (Braun & Clarke,
2006). The process of reviewing themes happens on two levels. Firstly, it involves
reading all the collated extracts for each theme, to ensure coherence. Where
coherence exist, you then move to the next theme and, where it is not, the researcher
will have to decide whether to rework a problematic theme or relocating the extracts
to another theme (Braun & Clarke, 2006). Secondly, the review process involves
verifying whether or not the individual themes are an accurate reflection of the
meanings embedded within the data set, with the accuracy thereof dependent on the
analytical approach adopted by the researcher (Braun & Clarke, 2006).
As indicated earlier, the process of reviewing themes takes two forms. In the first instance,
researchers spend time reading through the collated extracts under each theme, to ensure
that they are coherent. The main focus is on the extracts (storylines).
Once we leave the households friends and neighbours would sneak in and say, “but why did you
allow these people to come into your house, don’t you know that these people work with AIDS.” One
of the patients’ mother had to tell us that her neighbours asked her why she allowed us in because
we are working with AIDS.
“you would console yourself that you are here to work, and people are not the same. So, we would
not loose courage, we would go to the next house because we are here to work and we are here to
help the community. We don’t care about those who don’t want us, one day they will need us”.
Secondly, you will need to change your focus and direct it to the theme themselves, to see
whether they reflect the meaning embedded within the data sets and, if not, you will have to
correct them accordingly.
51
Learning unit 8 | RSC2601
Data analysis and interpretation
Once the researcher is satisfied about the themes developed in step 3, and reviewed
in step 4, they will proceed with the fifth step, which is to define and name the themes.
In defining and naming the themes, researchers identify the essence of their accurate
meaning (Braun & Clarke, 2006). At this stage, the researcher will revisit the data
extract of each theme and reorganise them coherently and consistently, as
accompanied by narrative accounts of the participants and, where necessary, rename
the themes and subthemes, making sure it is concise and punchy (Braun & Clarke,
2006).
Reading box. 9.5: An example of how researcher could rename the themes
In renaming the theme, researchers are guided by the essence of the meanings embedded in
the data extracts as well as the overall coherence of other themes. Following the example
provided in step 3, one might rename the themes as follows:
Once we leave the households friends and neighbours would sneak in and say, “but why did you
allow these people to come into your house, don’t you know that these people work with AIDS.” One
of the patients’ mother had to tell us that her neighbours asked her why she allowed us in because
we are working with AIDS.
“you would console yourself that you are here to work, and people are not the same. So, we would
not loose courage, we would go to the next house because we are here to work and we are here to
help the community. We don’t care about those who don’t want us, one day they will need us”.
Remember, there are no fussy rules. What should guide researchers is the overall extracts,
as read in the context of other themes. You might come up with different themes, as long as
they accurately reflect the essence of your storylines or extracts.
Once the themes are accurately named and defined, the researcher will then move to
the last step, which is the reporting stage.
52
Learning unit 8 | RSC2601
Data analysis and interpretation
Based on the final themes as supported by the extracts, the researcher will now begin
to tell a story of the data, coherently, logically and concisely, and in a non-repetitive
manner. The themes should be supported by enough data extracts or storylines, which
are, in a way, evidence to demonstrate the prevalence of a particular theme (Braun &
Clarke, 2006). Although Braun and Clarke do not recommend a format for a data
analysis report, they suggest that such a report should be a scholarly one. In other
words, your data should be presented logically, in a coherent fashion, in the context
of literature, including the adopted theories. It is in the report where your interpretation
of the data will happen. In terms of the structure, a research report on data
presentation will generally have an introduction, introducing the report, which will then
be followed by a description of the biographical profiles of the research participants.
After the biographical profiles, the themes and subthemes will be presented. An
example of the way a report can be presented is provided in reading box 9.6 below:
1. Introduction
Through this study, the researcher sought to understand the experiences and challenges
faced by caregivers when rendering services to people living with HIV, in the province of
Gauteng, South Africa. As part of the study, participants were expected to answer five
questions, posed to the participant in a semi-structured interview, around their experiences
as caregivers. The findings of the study are presented in this section of the report, in the
form of biographical profiles of the participants as well as the themes and subthemes that
emerged from the process of data analysis.
A total of fifteen caregivers were identified and recruited to take part in this study. Of the
fifteen, six were males and nine were females. Their ages varied between 24 and 55 years
old. Three of them were 55 years old, three were 30, 31 and 44, while two were 24 and 50,
respectively. Four of them were 34, 36, 37 and 39, respectively, while three were 40, 45
and 33, respectively. Looking at their ages, one gets a sense of an intergenerational mix.
Each of the features (age, gender, race, etc.), should be discussed in the context of
existing literature. You must clearly explain what other studies found regarding such
features and how different or similar are they with what your study has found, and
immediately draw your own conclusion regarding that.
Note: The biographical profiles of the participants can include various features,
depending on the purpose of the study. One might include things like the sources of
income, family composition, educational credentials, work experience and others.
19
The biographical profile of the participants (i.e., their ages, socioeconomic conditions, educational
qualifications) is one of the crucial ways in which the context is enhanced.
53
Learning unit 8 | RSC2601
Data analysis and interpretation
Regarding the findings on thematic analysis, two main themes and four subthemes
emerged. These themes and subthemes are introduced and explained below.
Once we leave the households friends and neighbours would sneak in and say, “but why
did you allow these people to come into your house, don’t you know that these people work
with AIDS.” One of the patients’ mother had to tell us that her neighbours asked her why did
she allow us in because we are working with AIDS.
As you did with the biographical profiles of the participants, you need to explain the
findings under each theme, using existing literature as well as your adopted
theoretical framework. You must explain the findings in the context of such literature
and the theory/theories.
8.6. CONCLUSION
In this learning unit, we have introduced you to several techniques for analysing and
interpreting quantitative and qualitative data. From the quantitative research approach,
the key methods discussed in this unit included descriptive statistics and their purpose
in data analysis. We also discussed frequency distribution tables and graphs, as
methods of summarising and organising quantitative data. The measures of central
tendency were discussed in detail, to help you understand which applicable measure
of central tendency may be useful for your data – by now you should be able to
calculate the measures of central tendencies using the formulas provided. We have
also discussed the measures of variability and their purpose in data analysis. The unit
also covered a brief, but detailed discussion on correlations and how graphical
representations from correlational analysis can be interpreted. From a qualitative
research perspective, we introduced the meaning of qualitative data analysis and the
54
Learning unit 8 | RSC2601
Data analysis and interpretation
purpose thereof. We also introduced, defined, and explained the various strategies or
methods that can be used to assist you in analysing various forms of qualitative data
and provided relevant practical scenarios to further enhance your knowledge. To
assist you in measuring your level of understanding, we provided a self-evaluation
exercise. Please work on the questions with the necessary dedication and engage us
for any questions or clarity.
55
Learning unit 8 | RSC2601
Data analysis and interpretation
This section aims to test your level of understanding of the content presented in this
learning unit.
• Are you able to differentiate between quantitative and qualitative data analysis?
• Are you able to indicate which data can be suitably used for graphs and tables?
• Are you able to differentiate between the measures of central tendency and
describe their purpose in data analysis?
• Are you able to define the measures of variability and describe their purpose in
data analysis?
• Are you able to define the concept of correlation and describe the context in
which you can use correlations to analyse your data?
• Are you able to identify and define various methods of qualitative data analysis?
• Are you able to explain how thematic analysis is implemented, following a step-
by-step process?
56
Learning unit 8 | RSC2601
Data analysis and interpretation
This section aims to enhance your learning experience on some of the learning
outcomes addressed in this learning unit. Please use the links below to watch
YouTube videos after reading the learning unit and answering the self-evaluation
assessment questions.
YouTube links
Overview - ATLAS ti 22 Windows - Bing video. (Electronic data analysis - Atlas ti 6.0)
Qualitative Content Analysis 101: The What, Why & How (With Examples) - YouTube
(Content analysis)
Jhangiani, R.S., Chiang, I. A., Cuttler, C., & Leighton, D.C. 2019. Research methods
in psychology. Kwantlen Polytechnic University.
https://kpu.pressbooks.pub/psychmethods4e/
57
Learning unit 8 | RSC2601
Data analysis and interpretation
Flick, U. 2014. The SAGE Handbook of qualitative data analysis. London: Sage.
https://methods.sagepub.com/book/the-sage-handbook-of-qualitaive-data-analysis.
58
Learning unit 8 | RSC2601
Data analysis and interpretation
8.10. REFERENCES
Bhome, S., Chandwani, V., Iyer, S., Prabhudesai, A., Jha, N., Desai, S., & Koshti, S.D.
2013. Research methodology. Himalaya publishing house.
Fram, S.W. 2013. The constant comparative analysis method outside of grounded
theory. The Qualitative Report, 18(1): 1-25.
HSRC.2010. South African social attitudes. Human Sciences Research Council.
Ibrahim, M. 2015. The art of analysis. Journal of Allied Health Sciences Pakistan,
1(1):98-104.
http://www.uop.edu.pk/ocontents/Lecture%201%20B%20Qualitative%20Research.p
df. (Accessed on 5 April 2023).
Isabirye, A.K., & Makoe, M. 2018. Phenomenological analysis of the lived experiences
if academics who participated in the professional development programme at an open
distance learning (ODL) university in South Africa. Indo-Pacific Journal of
Phenomenology, 18(1):1-11.
Jhangiani, R.S., Chiang, I. A., Cuttler, C., & Leighton, D.C. 2019. Research methods
in psychology. Canada: Kwantlen Polytechnic University.
https://kpu.pressbooks.pub/psychmethods4e/
Knoblauch, H., Tuma, R., & Schnettler, B. 2014. Video analysis and videography. In:
U.
Flick (ed). The Sage handbook of qualitative data analysis. London: Sage. (pp. 21-34)
59
Learning unit 8 | RSC2601
Data analysis and interpretation
Maxwell, J.A. & Chmiel, M. 2014. Notes toward a theory of qualitative data analysis.
In U. Flick (ed), The Sage handbook of qualitative data analysis. London: Sage. (pp.
21-34)
Mezmir, E.A. 2020. Qualitative data analysis: an overview of data reduction, data
display and interpretation. Research on Humanities and Social Sciences, 10(21): 15-
27
Moule, P. 2021. Making sense of research in nursing health and social care. London:
Sage.
Ngulube, P. 2015. Qualitative data analysis and interpretation: systematic search for
meaning. In E.R. Mathipa & M.T. Gumbo (eds), Addressing research challenges:
Making headway for developing researchers. Mosala-MASEDI Publishers &
Booksellers cc: Noordwyk, pp. 131-156.
file:///C:/Users/lekgamr/Downloads/NtindaK2018NarrativeResearchInLiamputtongPe
dsHandbookofResearchMethodsinHealthSocialSciences.SpringerSingapore.pdf.
(Accessed on 05 April 2023).
Peck, R., Olsen, C., & Devore, J. L. 2015. Introduction to statistics and data analysis.
Cengage Learning.
Yegidis, B.L., Weinbach, B.W. & Myers, L.L. 2018. Research methods for social
workers. 8th Edition. New York: Pearson.
60