Data Analysis - PHARM D 2025-Students
Data Analysis - PHARM D 2025-Students
PRESENTATION
VICTOR MOGRE (Ph.D.)
Population Sample
What needs to be done
before data analysis?
• Research questions/objectives are well defined.
• Variables are well defined
• Data collection is completed
• Be very clear in your mind about what you are
looking for
• What is your sample size?
• How many variables were investigated?
• Interval/Ratio Categories
• Variables are ordered and have equal space between
them
• Salary measured in dollars
• Age of an individual
• Number of children
• Thermometer scores measure intensity on issues
• Views on abortion
• Whether someone likes China
• Feelings about US-Japan trade issues
• Any dichotomous variable
• A variable that takes on the value 0 or 1.
• A person’s gender (M=0, F=1)
Measuring Data
• Discrete Data
• Takes on only integer values
• whole numbers no decimals
• E.g., number of people in the room
• Continuous Data
• Takes on any value
• All numbers including decimals
• E.g., Rate of population increase
Discrete Data
Example: Absenteeism for 50 employees
I II III IV V
Relative
x− X f (x − X )
2
Absences Frequency Frequency
x f f/N
0 3 .06 -4.86 70.8
1 2 .04 -3.86 29.8
2 5 .10 -2.86 40.9
3 8 .16 -1.86 27.7
4 7 .14 -0.86 5.2
5 2 .04 0.14 .04
6 13 .26 1.14 16.9
7 2 .04 2.14 9.2
8 4 .08 3.14 39.4
9 0 .00 4.14 0
10 1 .02 5.14 26.4
11 2 .04 6.14 75.4
12 0 .00 7.14 0
13 1 .02 8.14 66.3
N=50 1.00 = 408.02
Absenteeism for 50
employees I II III IV V
Relative
x− X f (x − X )
2
Absences Frequency Frequency
• Frequency
8 4 .08 3.14 39.4
9 0 .00 4.14 0
10 1 .02 5.14 26.4
11 2 .04 6.14 75.4
• Relative Frequency
• Number of times a particular event takes place
in relation to the total.
• For example, about a quarter of the people were
absent exactly 6 times in the year.
Continuous Data
p.396 Sekaran
School of Medicine, University for
27
Development Studies, Tamale.
Frequency distribution
29
Measures of Central
Tendency
• Mode
• The category occurring most often.
• Median
• The middle observation or 50th percentile.
• the most frequent score in a distribution
• good for nominal data
• Mean
• The average of the observations
• The ‘average’ score—sum of all individual scores divided by the
number of scores
• many statistics are based on the mean
• has a number of useful statistical properties
• however, can be sensitive to extreme scores (“outliers”)
Source: www.wilderdom.com/.../L2-1UnderstandingIQ.html
Descriptive Statistics
• Variability
• Range
• Difference between the two most extreme observations
• The difference between the largest and smallest observation.
• Limited measure of dispersion
• Inter-quartile range
• Divide observations into quarters & use the middle half
• The difference between the 75th percentile and the 25th percentile in the
data.
• Better because it divides the data finer.
• But it still only uses two observations
• Would like to incorporate all the data, if possible
• Standard Deviation
• Take each observation’s difference from the mean, square it, add all such
squared differences, and divide the result by number of observations
• A summary statistic of how much scores vary from the mean
• Represents the average amount of dispersion in a sample
p.397 Sekaran
Variability
• Variance
• Square of standard deviation
• Average of squared distances of individual points from the
mean
• High variance means that most scores are far away
from the mean. Low variance indicates that most
scores cluster tightly about the mean.
• The amount that one score differs from the mean is
called its deviation score (deviate)
• The sum of all deviation scores in a sample is called
the sum of squares
3.0
2.5
2.0
1.5
1.0
Exam 1
Histograms
POPULATION
Sample
Inferential Statistics
• Accuracy of inference depends on
representativeness of sample from population
• random selection
• equal chance for anyone to be selected makes sample
more representative
Inferential Statistics
• Inferential statistics help researchers test
hypotheses and answer research questions, and
derive meaning from the results
• a result found to be statistically significant by testing
the sample is assumed to also hold for the population
from which the sample was drawn
• the ability to make such an inference is based on the
principle of probability
School of Medicine, University for
46
Development Studies, Tamale.
Inferential Statistics
• Researchers set the significance level for each
statistical test they conduct
• by using probability theory as a basis for their tests,
researchers can assess how likely it is that the
difference they find is real and not due to chance
Alternative and Null
Hypotheses
• Inferential statistics test the likelihood that the
alternative (research) hypothesis (H1) is true and
the null hypothesis (H0) is not
• in testing differences, the H1 would predict that
differences would be found, while the H0 would predict
no differences
• by setting the significance level (generally at .05), the
researcher has a criterion for making this decision
Alternative and Null Hypotheses
• If the .05 level is achieved (p is equal to or less
than .05), then a researcher rejects the H0 and
accepts the H1
• If the the .05 significance level is not achieved,
then the H0 is retained
Associations: Errors to note
• Two types of pitfalls can occur that
affect the association between
exposure and disease
• Type 1 error: observing a difference
when in truth there is none
• Type 2 error: failing to observe a
difference where there is one.
50
Confidence Interval - Definition
Point estimate
• Confidence interval
• 95% C.I. means that true estimate of
effect (mean, risk, rate) lies within 2
standard errors of the population
mean 95 times out of 100
Sever 52
Interpreting Results
p.394 Sekaran
Tests of Mean Differences
• T-test
• Compares whether means of two groups are different from
each other 95% of the time
• Test the difference between two sample means for significance
• pretest to posttest
• Relates to research design
• Perhaps used for information literacy instruction
• Compares differences on one independent variable
• Paired t-test= Same group, two different times or
measurements
• Can be used as a post-hoc or planned contrast after conducting
ANOVA analyses
• Beware the number of t-tests done reduces confidence level so use
Scheffe’s, Duncan multiple range etc.
Tests of Mean Differences
• ANOVA (F-test)
• Compares whether means of three or more groups are
different from each other 95% of the time
• Compares two or more independent variables
• Tests interaction effects: Does the effect of one IV depend on the level
of the other IV?
• Repeated measures ANOVA: Same sample, multiple
times/measurements
• I.E. ANOVA also can be used to test the difference among more than two
means in a single test—which cannot be done with a t test
• Sparingly conduct T-test to see if pairs of groups are
significantly different from each other
Association tests: categorical
• Chi-square test of independence: two variables
(nominal and nominal, nominal and ordinal, or
ordinal and ordinal)
• Affected by number of cells, number of cases
• 2-tailed distribution= null hypothesis
• 1-tailed distribution= directional hypothesis
• Cramer’s V, Phi
Tests of Association
405-407 Sekaran
More tests
• While correlation and regression both indicate
association between variables, correlation studies
assess the strength of that association
• Regression analysis, which examines the
association from a different perspective, yields an
equation that uses one variable to explain the
variation in another variable.
• Regression is used to predict the value of one
variable by knowing the value of another variable
More tests
• Multiple regression examines the relationship
between a dependent variable (changes in
response to the change the researcher makes to
the independent variable) and two or more
independent variables (manipulated variables)
• Stepwise multiple regression predicts the value of a
dependent variable using independent variables,
and it also examines the influence, or relative
importance, of each independent variable on the
dependent variable
Which Test to Use?
Scale of Data
Interval (continuous)
T-test
- 2 groups
Interval (continuous)
ANOVA
- 3 or more groups
64
Inferential Statistics
Source: https://bmcnutr.biomedcentral.com/articles/10.1186/s40795-020-00393-0
Source:
https://bmcendocrdisord.biomedcentral.com/track/
pdf/10.1186/s12902-017-0169-3.pdf