0% found this document useful (0 votes)
19 views82 pages

Data Analysis - PHARM D 2025-Students

Research and Methodology

Uploaded by

Sylvester Asare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views82 pages

Data Analysis - PHARM D 2025-Students

Research and Methodology

Uploaded by

Sylvester Asare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

DATA ANALYSIS AND

PRESENTATION
VICTOR MOGRE (Ph.D.)

School of Medicine, University for


1
Development Studies, Tamale.
Learning objectives
• To explain data analysis and why it is necessary
• To understand the different kinds of data
analysis and when to use them
• To explain categorical and continuous variables
• To be able to perform data analysis using
appropriate statistical tools
• To understand how the present one’s research
findings

School of Medicine, University for


2
Development Studies, Tamale.
Project cycle

School of Medicine, University for


3
Development Studies, Tamale.
What is data analysis?

School of Medicine, University for


4
Development Studies, Tamale.
Purpose

School of Medicine, University for


5
Development Studies, Tamale.
Reasons for using statistics
• aids in summarizing the results
• helps us recognize underlying trends and
tendencies in the data
• aids in communicating the results to others
The work flow for data
analysis

School of Medicine, University for


7
Development Studies, Tamale.
Statistical terms
• Population
• complete set of individuals, objects or measurements
• Sample
• a sub-set of a population
• Variable
• a characteristic which may take on different values
• Data
• numbers or measurements collected
• A parameter is a characteristic of a population
• e.g., the average height of all Britons.
• A statistic is a characteristic of a sample
• e.g., the average height of a sample of Ghanaians.
Sample vs. Population

Population Sample
What needs to be done
before data analysis?
• Research questions/objectives are well defined.
• Variables are well defined
• Data collection is completed
• Be very clear in your mind about what you are
looking for
• What is your sample size?
• How many variables were investigated?

School of Medicine, University for


10
Development Studies, Tamale.
Types of variables

1. Categorical: (e.g., Sex, Marital Status,


income category)
2. Continuous: (e.g., Age, income, weight,
height, time to achieve an outcome)
3. Discrete: (e.g.,Number of Children in a
family)
4. Binary or Dichotomous: (e.g., response to
all Yes or No type of questions)
11
Brain Size and IQ
What types of data do these variables represent?
Gender FSIQ VIQ PIQ Weight Height MRI Count
Female 133 132 124 118 64.5 816932
Male 140 150 124 124 72.5 1001121
Male 139 123 150 143 73.3 1038437
Male 133 129 128 172 68.8 965353
Female 137 132 134 147 65 951545
Female 99 90 110 146 69 928799
Female 138 136 131 138 64.5 991305
Female 92 90 98 175 66 854258
Male 89 93 84 134 66.3 904858
Male 133 114 147 172 68.8 955466
Female 132 129 124 118 64.5 833868
12
Measuring scales
1. Nominal: These data do not represent an
amount or quantity (e.g., Marital Status, Sex)
2. Ordinal: These data represent an ordered series
of relationship (e.g., level of education)
3. Interval: These data is measured on an interval
scale having equal units but an arbitrary zero
point. (e.g.: Temperature in Fahrenheit)
4. Interval Ratio: Variable such as weight for
which we can compare meaningfully one weight
versus another (say, 100 Kg is twice 50 Kg)
13
More information on types of
Variables
• Nominal Categories
• Names or types with no inherent ordering:
• Religious affiliation ◼ Marital status
• Race
• Ordinal Categories
• Variables with a rank or order
• University Rankings ◼ Academic position
• Level of education
Types of Variables (con’t)

• Interval/Ratio Categories
• Variables are ordered and have equal space between
them
• Salary measured in dollars
• Age of an individual
• Number of children
• Thermometer scores measure intensity on issues
• Views on abortion
• Whether someone likes China
• Feelings about US-Japan trade issues
• Any dichotomous variable
• A variable that takes on the value 0 or 1.
• A person’s gender (M=0, F=1)
Measuring Data
• Discrete Data
• Takes on only integer values
• whole numbers no decimals
• E.g., number of people in the room
• Continuous Data
• Takes on any value
• All numbers including decimals
• E.g., Rate of population increase
Discrete Data
Example: Absenteeism for 50 employees

I II III IV V
Relative
x− X f (x − X )
2
Absences Frequency Frequency
x f f/N
0 3 .06 -4.86 70.8
1 2 .04 -3.86 29.8
2 5 .10 -2.86 40.9
3 8 .16 -1.86 27.7
4 7 .14 -0.86 5.2
5 2 .04 0.14 .04
6 13 .26 1.14 16.9
7 2 .04 2.14 9.2
8 4 .08 3.14 39.4
9 0 .00 4.14 0
10 1 .02 5.14 26.4
11 2 .04 6.14 75.4
12 0 .00 7.14 0
13 1 .02 8.14 66.3
N=50 1.00  = 408.02
Absenteeism for 50
employees I II III IV V
Relative
x− X f (x − X )
2
Absences Frequency Frequency

Discrete Data (con’t)


x f f/N
0 3 .06 -4.86 70.8
1 2 .04 -3.86 29.8
2 5 .10 -2.86 40.9
3 8 .16 -1.86 27.7
4 7 .14 -0.86 5.2
5 2 .04 0.14 .04
6 13 .26 1.14 16.9
7 2 .04 2.14 9.2

• Frequency
8 4 .08 3.14 39.4
9 0 .00 4.14 0
10 1 .02 5.14 26.4
11 2 .04 6.14 75.4

• Number of times we observe an event


12 0 .00 7.14 0
13 1 .02 8.14 66.3
N=50 1.00  = 408.02

• In our example, the number of employees that were


absent for a particular number of days.
• Three employees were absent no days (0) through
out the year

• Relative Frequency
• Number of times a particular event takes place
in relation to the total.
• For example, about a quarter of the people were
absent exactly 6 times in the year.
Continuous Data

Example: Men’s Heights


Relative Percentile
Height Midpoint Frequency Frequency (Cumulative
(f) (f / N) Frequency)
58.5-61.5 60 4 .02 .02
61.5-64.5 63 12 .06 .08
64.5-67.5 66 44 .22 .30
67.5-70.5 69 64 .32 .62
70.5-73.5 72 56 .28 .90
73.5-76.5 75 16 .08 .98
76.5-79.5 78 4 .02 1.00
N=200 1.00
Collate your data

• Try to put your data in one place


• Using excel spread sheets
• Statistical software such as SPSS
• For qualitative research use nvivo
• If quantitative ensure the data is cleaned and well
coded to allow for easy analysis
• Consult a statistician to help with the analysis

School of Medicine, University for


20
Development Studies, Tamale.
The kinds of data analysis
you choose is dependent
on your research questions
or objectives

School of Medicine, University for


21
Development Studies, Tamale.
School of Medicine, University for
22
Development Studies, Tamale.
Statistics: What’s What?
◼ Descriptive ◼ Comparative
objectives/ research objectives/
questions: hypotheses

◼ Descriptive statistics ◼ Inferential Statistics


School of Medicine, University for
24
Development Studies, Tamale.
Descriptive analysis
• They help one to understand the sample or the
study population
• Provides a description of the basic characteristics
• Age, gender, number of children etc
• Usually common basic analysis
• Mean, mode, range, median, standard deviation
• frequencies, percentiles, mean, inter-quartile ranges
• Helps to provide an initial overview of the data and
can inform further analysis decisions

School of Medicine, University for


25
Development Studies, Tamale.
Descriptive Statistics
• Frequencies, percentiles
• Central Tendency
• Mean
• Median
• Mode

p.396 Sekaran
School of Medicine, University for
27
Development Studies, Tamale.
Frequency distribution

• The frequency with which observations are


assigned to each category or point on a
measurement scale.
• Most basic form of descriptive statistics
• May be expressed as a percentage of the total sample
found in each category
Frequency Table
• Generally, the first approach to examining
your data.
• Identifies distribution of variables overall
• Identifies potential outliers
• Investigate outliers as possible data entry
errors
• Investigate a sample of others for data entry
errors

29
Measures of Central
Tendency
• Mode
• The category occurring most often.
• Median
• The middle observation or 50th percentile.
• the most frequent score in a distribution
• good for nominal data
• Mean
• The average of the observations
• The ‘average’ score—sum of all individual scores divided by the
number of scores
• many statistics are based on the mean
• has a number of useful statistical properties
• however, can be sensitive to extreme scores (“outliers”)
Source: www.wilderdom.com/.../L2-1UnderstandingIQ.html
Descriptive Statistics
• Variability
• Range
• Difference between the two most extreme observations
• The difference between the largest and smallest observation.
• Limited measure of dispersion
• Inter-quartile range
• Divide observations into quarters & use the middle half
• The difference between the 75th percentile and the 25th percentile in the
data.
• Better because it divides the data finer.
• But it still only uses two observations
• Would like to incorporate all the data, if possible
• Standard Deviation
• Take each observation’s difference from the mean, square it, add all such
squared differences, and divide the result by number of observations
• A summary statistic of how much scores vary from the mean
• Represents the average amount of dispersion in a sample
p.397 Sekaran
Variability
• Variance
• Square of standard deviation
• Average of squared distances of individual points from the
mean
• High variance means that most scores are far away
from the mean. Low variance indicates that most
scores cluster tightly about the mean.
• The amount that one score differs from the mean is
called its deviation score (deviate)
• The sum of all deviation scores in a sample is called
the sum of squares

School of Medicine, University for


33
Development Studies, Tamale.
Descriptive Statistics
• Variability (cont’d)
• Confidence intervals
• The range of values in which the mean occurs 95% of the time
• Typically includes scores that are two standard errors above or below
statistic
• Standard error: Type of standard deviation (for more see p. 287
Sekaran)
• Standard scores (Zs)
• Deviation from the mean divided by standard deviation
• Mean of all Zs =0, sd=1
• Useful for computing interaction scores in regression analyses
Statistical graphs of data
• A picture is worth a thousand words!

• Graphs for numerical data:


Histograms
Frequency polygons
Pie

• Graphs for categorical data


Bar graphs
Pie
Histograms
◼ Univariate histograms
3.5

3.0

2.5

2.0

1.5

1.0

.5 Std. Dev = .12


Mean = .80
0.0 N = 13.00
.63 .69 .75 .81 .88 .94 1.00

Exam 1
Histograms

• f on y axis (could also plot p or % )


• X values (or midpoints of class intervals) on x axis
• Plot each f with a bar, equal size, touching
• No gaps between bars
Bar Graphs
• For categorical data
• Like a histogram, but with gaps between bars
• Useful for showing two samples side-by-side
Frequency distribution of random errors

As number of measurements increases the distribution becomes


more stable
- The larger the effect the fewer the data you need to identify it
Many measurements of continuous variables show a bell-
shaped curve of values this is known as a Gaussian distribution.
In conclusion
• Descriptive statistics are used to summarize data
from individual respondents, etc.
• They help to make sense of large numbers of individual
responses, to communicate the essence of those
responses to others
• They focus on typical or average scores, the
dispersion of scores over the available responses,
and the shape of the response curve
School of Medicine, University for
41
Development Studies, Tamale.
Inferential Statistics
◼ Allows for comparisons across variables
◼ i.e. is there a relation between one’s
occupation and their reason for using the
public library?
◼ Hypothesis Testing
Inferential Statistics
• Inferential statistics are used to draw conclusions
about a population by examining the sample

POPULATION

Sample
Inferential Statistics
• Accuracy of inference depends on
representativeness of sample from population
• random selection
• equal chance for anyone to be selected makes sample
more representative
Inferential Statistics
• Inferential statistics help researchers test
hypotheses and answer research questions, and
derive meaning from the results
• a result found to be statistically significant by testing
the sample is assumed to also hold for the population
from which the sample was drawn
• the ability to make such an inference is based on the
principle of probability
School of Medicine, University for
46
Development Studies, Tamale.
Inferential Statistics
• Researchers set the significance level for each
statistical test they conduct
• by using probability theory as a basis for their tests,
researchers can assess how likely it is that the
difference they find is real and not due to chance
Alternative and Null
Hypotheses
• Inferential statistics test the likelihood that the
alternative (research) hypothesis (H1) is true and
the null hypothesis (H0) is not
• in testing differences, the H1 would predict that
differences would be found, while the H0 would predict
no differences
• by setting the significance level (generally at .05), the
researcher has a criterion for making this decision
Alternative and Null Hypotheses
• If the .05 level is achieved (p is equal to or less
than .05), then a researcher rejects the H0 and
accepts the H1
• If the the .05 significance level is not achieved,
then the H0 is retained
Associations: Errors to note
• Two types of pitfalls can occur that
affect the association between
exposure and disease
• Type 1 error: observing a difference
when in truth there is none
• Type 2 error: failing to observe a
difference where there is one.

50
Confidence Interval - Definition

A range of values for a variable constructed so


that this range has a specified probability of
including the true value of the variable
A measure of the study’s precision

Point estimate

Lower limit Upper limit


Sever 51
Statistical Measures of Chance

• Confidence interval
• 95% C.I. means that true estimate of
effect (mean, risk, rate) lies within 2
standard errors of the population
mean 95 times out of 100

Sever 52
Interpreting Results

Confidence Interval: Range of values for a point


estimate that has a specified probability of
including the true value of the parameter.

Confidence Level: (1.0 – ), usually expressed


as a percentage (e.g. 95%).

Confidence Limits: The upper and lower end


points of the confidence interval.
53
Hypothetical Example of 95% Confidence Interval

Exposure: Caffeine intake (high versus low)


Outcome: Incidence of breast cancer
Risk Ratio: 1.32 (point estimate)
p-value: 0.14 (not statistically significant)
95% C.I.: 0.87 - 1.98

95% confidence interval


_____________________________________________________
0.0 0.5 1.0 1.5 2.0
(null value)
54
Types of Inferential Statistics
• Parametric vs. non-parametric statistics
• Non-parametric does not assume normal distribution of data
• T-test
• ANOVA (F)
• Correlations (r)
• Types of
• Multiple-regression (R)
• Regression weights (ß); Variance explained (R2)

p.394 Sekaran
Tests of Mean Differences
• T-test
• Compares whether means of two groups are different from
each other 95% of the time
• Test the difference between two sample means for significance
• pretest to posttest
• Relates to research design
• Perhaps used for information literacy instruction
• Compares differences on one independent variable
• Paired t-test= Same group, two different times or
measurements
• Can be used as a post-hoc or planned contrast after conducting
ANOVA analyses
• Beware the number of t-tests done reduces confidence level so use
Scheffe’s, Duncan multiple range etc.
Tests of Mean Differences

• ANOVA (F-test)
• Compares whether means of three or more groups are
different from each other 95% of the time
• Compares two or more independent variables
• Tests interaction effects: Does the effect of one IV depend on the level
of the other IV?
• Repeated measures ANOVA: Same sample, multiple
times/measurements
• I.E. ANOVA also can be used to test the difference among more than two
means in a single test—which cannot be done with a t test
• Sparingly conduct T-test to see if pairs of groups are
significantly different from each other
Association tests: categorical
• Chi-square test of independence: two variables
(nominal and nominal, nominal and ordinal, or
ordinal and ordinal)
• Affected by number of cells, number of cases
• 2-tailed distribution= null hypothesis
• 1-tailed distribution= directional hypothesis
• Cramer’s V, Phi
Tests of Association

• Pearson coefficient (r)


• Assesses whether 2 variables are ‘linearly’ related to each other
95% of the time
• the extent to which two variables are related across a group of
subjects
• Reflects the direction and the strength of the relation
• Varies from –1 to +1.
• -1.00 is a perfect inverse relationship—the strongest possible
inverse relationship
• 0.00 indicates the complete absence of a relationship
• 1.00 is a perfect positive relationship—the strongest possible
direct relationship
• The closer a value is to 0.00, the weaker the relationship
• The closer a value is to -1.00 or +1.00, the stronger it is
Tests of Association
• Types of Correlations
• When both variables are continuous: Pearson product-moment
• When both variables are nominal (categorical)
• Two categories for each variable: Phi
• Multiple categories for each variable: Kappa
• When both variables are ordinal: Spearman rank
• Significance of r = t-test
Tests of Association
• Multiple correlation (R)
• Describe relation between 3 or more variables (e.g., 2
predictors and one criterion)
• Two different formulae depending on whether or not
predictors are correlated with each other
• Tests non-linear relationships
• Significance of R =F-test
• Are variables related to each other 95% of the time?

405-407 Sekaran
More tests
• While correlation and regression both indicate
association between variables, correlation studies
assess the strength of that association
• Regression analysis, which examines the
association from a different perspective, yields an
equation that uses one variable to explain the
variation in another variable.
• Regression is used to predict the value of one
variable by knowing the value of another variable
More tests
• Multiple regression examines the relationship
between a dependent variable (changes in
response to the change the researcher makes to
the independent variable) and two or more
independent variables (manipulated variables)
• Stepwise multiple regression predicts the value of a
dependent variable using independent variables,
and it also examines the influence, or relative
importance, of each independent variable on the
dependent variable
Which Test to Use?
Scale of Data

Nominal Chi-square test

Ordinal Mann-Whitney U test

Interval (continuous)
T-test
- 2 groups
Interval (continuous)
ANOVA
- 3 or more groups
64
Inferential Statistics

Statistical techniques used for different types of variables

Type of Independent Variable


Continuous Categorical

Type of Continuous Correlation T-test (2 groups); ANOVA


Dependent (2 var),
Variable Regression

Categorical Chi-square, Phi, Kappa,


Spearman rank correlation

See also p. 405 Sekaran


School of Medicine, University for
66
Development Studies, Tamale.
School of Medicine, University for
67
Development Studies, Tamale.
School of Medicine, University for
68
Development Studies, Tamale.
General guidelines for
presenting results
• Usually present first the general characteristics of
the study participants.
• Present the results according to your
objectives/research questions
• Present descriptive statistics of your variables being
investigated
• Now present the results of the inferential statistics
• Start with univariate findings
• Multivariate findings
• Both tables and text can be used to present the
data
School of Medicine, University for
69
Development Studies, Tamale.
Presentation guidelines
• Do not use Figures and Tables to present the same
results
• Number your tables consecutively
• Ensure that the reader can understand the Table or
Figure without reading the text
• Number figures consecutively
• Identify both figures and tables in your results
description
• Tables and Figures should be presented after their
description in the text
School of Medicine, University for
70
Development Studies, Tamale.
Example 1: General
characteristics
In all, 215 mother/care giver-child pairs were approached in which
200 agreed and consented to participate in the study. Table 1
shows the demographic characteristics of the mothers’/care givers
and their children. The mean (SD) age of the mothers/care givers
was 27 (5.12) years. Majority of the mothers/care givers were
married (96%), 49.0% had no formal education and 77.0% were
Christians. The mean (SD) age of the children was 12 (5.15)
months and the majority were males (56.5%).

Source: https://bmcnutr.biomedcentral.com/articles/10.1186/s40795-020-00393-0

School of Medicine, University for


71
Development Studies, Tamale.
School of Medicine, University for
72
Development Studies, Tamale.
Presentation of variables
investigated

School of Medicine, University for


73
Development Studies, Tamale.
Presenting results on your
objectives: descriptive

School of Medicine, University for


74
Development Studies, Tamale.
Determinants of minimum adequate diet
Table 4 shows the univariate and multivariate
determinants of minimum adequate diet. Significant
determinants of adequate diet were mothers/care
givers having high knowledge in child feeding
recommendations; and the father of the child
reportedly earning adequate income for the upkeep
of the family.

School of Medicine, University for


75
Development Studies, Tamale.
Presentation of results:
Inferential

School of Medicine, University for


76
Development Studies, Tamale.
Multivariate associations between participant
characteristics and self-care behaviours
Table 3 presents the regression models of factors
associated with adherence to the four self-care
behaviours. Number of years in school was
associated with frequency of adhering to diet (r =
0.223, p = 0.002), exercise (r = 0.168, p = 0.022), and
foot care (r = 0.153, p = 0.037).

Source:
https://bmcendocrdisord.biomedcentral.com/track/
pdf/10.1186/s12902-017-0169-3.pdf

School of Medicine, University for


77
Development Studies, Tamale.
School of Medicine, University for
78
Development Studies, Tamale.
Writing the data analysis
section of your thesis
• Mention statistical software that was used
• Describe all the statistical tools used in your data analysis
and how they were used
• How one controlled for confounding
• Methods for stratified analyses or interactions
• How missing data was addressed
• Explain how variables were handled in the analysis, i.e.
continuous, categorical, if so what were the categories.
• Describe all the descriptive statistics that were used
• Describe all the inferential statistics that were used
• Do not describe tools that you did not use
• Indicate the confidence level and level of significance
used School of Medicine, University for
79
Development Studies, Tamale.
Example 1: Statistical or data analysis
Data collected were keyed into Microsoft Excel and then transferred to the Statistical
Package for Social Science (SPSS) statistics software for analysis. Descriptive statistics of
frequencies, mean, and standard deviation were employed to describe the data. Marital
status was categorized into married (including married, cohabiting, and living together)
and not married (including single, divorced, and widowed), occupation status into
employed and unemployed; educational status into High (including senior high school and
tertiary level of education and low (including no formal education, primary and junior high
school level) and monthly income categorized into ≥ GHC 500 and < GHC 500. To
determine univariate factors of WDDS, Pearson correlation analysis was for continuous
variables (i.e. nutrition knowledge, attitudes, WDDS, age, number of antenatal care visits,
gestation, parity and household size); and the student t-test and one-way anova were also
employed for categorical variables (marital status, occupation, monthly income, and
educational status) and WDDS. Multiple linear regression analysis was used to determine
factors associated with the dietary diversity of the pregnant women. The independent
variables included nutrition knowledge, attitudes towards nutrition, age, marital status,
educational status, monthly income, occupation, household size, parity, gestation, and
number of antenatal care visits. A p-value of <0.05 was considered significant.

School of Medicine, University for


80
Development Studies, Tamale.
Example 2: Data analysis
We analysed the data using the Statistical Package for the Social Sciences (SPSS)
software. Descriptive statistics of mean, standard deviation and frequencies were used to
describe the data. The dependent variable was minimum adequate diet which was
classified into those who met the criteria (Yes) and those who did not meet (No).
Independent variables were child’s age (6-11 months vrs. ≥ 12 months), mothers age (<
30 years, ≥ 30 years), mother’s level of education (No formal education, High, Low),
mothers employment status (Employed, Not employed), Child’s father having adequate
income (Yes/No), marital status of mother (Married, Single), and religion (Christianity,
Islamic, Traditionalist). To evaluate determinants of adequate diet the following
analytical approaches were used: univariate and multivariate tests. The Univariate tests
adopted were Chi-square test and Fisher’s exact test. Fixer’s exact test was used for
responses that were less than 10 participants. To identify factors associated with
minimum adequate diet while adjusting for confounders, multivariate logistic regression
(a priori selection) was conducted. A p-value of less than 0.05 was considered
significant.

School of Medicine, University for


81
Development Studies, Tamale.
THANK YOU
Contact: 0208442438
vmogre@uds.edu.gh

School of Medicine, University for


82
Development Studies, Tamale.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy