100% found this document useful (1 vote)
3K views34 pages

Lab Report Biostats

Uploaded by

FARALIZA AHMAD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
3K views34 pages

Lab Report Biostats

Uploaded by

FARALIZA AHMAD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

lOMoARcPSD|21351132

LAB Report Biostats

experimental design analysis (Universiti Teknologi MARA)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

FACULTY OF APPLIED SCIENCE

BACHELOR OF SCIENCE (HONS) BIOLOGY

BIO 610: EXPERIMENTAL BIOLOGY: DESIGN &ANALYSIS

PREPARED BY:

NAME STUDENT ID

MUHAMMAD BIN ABU BAKAR 2021483292

AIMAN ASYRAAF BIN AHMAD KAMAL 2021488708


ARIFFIN

AHMAD AFIF BIN AHMAD ASRI 2021480642

UNGKU MUHAMMAD HAFIZ BIN UNGKU 2021774005


SUZAINI

GROUP: AS2014B1 & AS2015B1

PREPARED FOR: DR. NURHAMIMAH BINTI ZAINAL ABIDIN

LAB 1: FREQUENCY TABLES, STEM-AND-LEAF PLOTS AND SUMMARY STATISTIC

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

1.0 INTRODUCTION

In the field of data analysis, understanding the distribution of data is fundamental for making
informed decisions and drawing meaningful conclusions. One of the initial steps in exploring data
distribution involves constructing a frequency table to organize and summarize the data. A frequency
table provides a clear representation of how often specific values or ranges occur within a dataset.
Alongside frequency tables, stem and leaf plots offer a visual tool that allows for a more detailed
examination of the data's distribution, providing insights into patterns, clusters, and outliers.

This experiment aims to equip us with the knowledge and skills necessary to analyse data distribution
through the utilization of frequency tables, stem and leaf plots, and summary statistics. By employing
these techniques, we can effectively explore, summarize, and interpret datasets, uncovering valuable
insights that inform decision-making and enhance our understanding of the data at hand.

Moreover, in the field of statistics, our primary focus lies on understanding the spread of measured
values rather than dwelling on specific outcomes of individual measurements. Our initial step
involves contemplating how to represent the distributions of random numbers. A considerable
amount of statistical research is dedicated to identifying and defining the distribution that
corresponds to a particular set of measurements or observations.

In biology, we often classify the elements in our surroundings, and statistics follows a similar
approach. Probability distributions serve as fundamental concepts in statistics, capturing our utmost
interest. The data we collect can be represented using various probability distributions such as
binomial, normal, chi-square, among others. Summary statistics offer multiple techniques to classify
probability distributions, enabling us to focus on the most essential characteristics we need without
having to fully specify the entire distributions.

2.0 OBJECTIVE

1. To explore the PULSE-RATE data in sample with a stem-and-leaf plot and frequency table.
2. To calculate and interpret summary statistics (descriptive statistics) of the PULSE-RATE
data.
3. To determine if the distribution follows normality.

3.0 HYPOTHESIS

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

If the frequency of pulse-rate of resting student is ranging in 60 to 100 beats, the distribution
follows the normal range.

4.0 MATERIALS
- Stopwatch

5.0 METHODS

1. The resting pulse rate of the students in the class was counted and the data was tabulated.
2. Frequency distribution table are formed from the data and histogram was constructed.
3. Stem-and-leaf plot of the data was constructed. The shape, location and spread of the
distribution was described.
4. The summary statistics of the class data set was calculated.
5. The summary statistics for males and females are calculated.
6. The 5-point summary for PULSE-RATE data was determined.
7. Boxplot was draw for each PULSE-RATE data and the distribution’s shape and spread was
described.
8. The PULSE-RATE data was entered into a computer file using excel format and was analysed
with SPSS.

6.0 RESULTS

1. Tabulate data
a. Table of raw data (number of beats per minute)

88 102 87 93 66
76 83 90 78 89
90 91 101 85 93
94 91 99 86 93
93 98 98 121 94
75 81 97 95 105

b. Frequency distribution table


Class width = 121-66 = 11
5

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

Class Class Class Frequency Relative Cumulative


interval boundary midpoint frequency frequency
66 – 76 65.5 – 76.5 71 3 0.10 3
77 – 87 76.5 – 87.5 82 6 0.20 9
88 – 98 87.5 – 98.5 93 16 0.53 25
99 – 109 98.5 – 109.5 104 4 0.13 29
110 – 120 109.5 – 120.5 115 0 0.00 29
121 – 131 120.5 – 131.5 126 1 0.03 30

∑ f = 30

c. Histogram

Distribution of student's Pulse-rate


18

16

14

12
Frequency

10

8 16

4
6
2 4
3
1
0 0
65.5 – 76.5 76.5 – 87.5 87.5 – 98.5 98.5 – 109.5 109.5 – 120.5 120.5 – 131.5

Pulse-rate

- The distribution of the histogram is symmetrical forming a bell shaped.

2. Stem-and-leaf plot

Unordered stem-and-leaf plot

6 6
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

7 568

8 8913756

9 835303941048317

10 125

11

12 1

Ordered stem-and-leaf plot.

6 6

7 568

8 1356789

9 001133334457889

10 125

11

12 1

Summary of stem-and-leaf plot:

- It can be concluded that the stem-and-leaf plot is symmetrical.


- The shaped of the stem-and-lead plot is bell-shaped.
- The mode of the data set is 93.
-

3. Summary statistic calculation

2732
Mean = = 91.07
30

Mode = 93

30
Median = at position 15, data is 91 and 93
2

91+93
The median is =
2

= 92
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

∑(𝑋−Ẋ)02
Variance, s2 =
𝑛−1

3095.87
=
30−1

= 106.75

∑(𝑋−Ẋ)02
Standard deviation, s = √ 𝑛−1

= √106.75

= 10.33

4. 5-point summary & boxplot

66,75,76,78,81,83,85,86,87,88,89,90,90,91,91,93,93,93,93,94,94,95,97,98,98,99,101,102,105,121

Min Q1 Median Q3
Max

Min = 66, Max = 121, Median = 92, Q1 = 86, Q3 = 97

66 86 92 97 121

60 70 80 90 100 110 120 130

Interquartile range, IQR = Q3 – Q1 = 97 – 86 = 11

Q1 – 1.5 (IQR) = 86 – 1.5 (11) = 69.5

Q3 + 1.5 (IQR) = 97 – 1.5 (11) = 113.5

The outliers of the data are 66 and 121.

Based on the boxplot constructed, the data is symmetric and normally distribute with minimum
value 66, median 92 and the maximum value 121.

7.0 DISCUSSION

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

The experiment starts by gathering the pulse rate data of several students in a class. Thirty students
volunteered to record their pulse rates for this experiment. The collected pulse rates ranged from 66
bpm to 121 bpm. The first section of this report will focus on the construction and interpretation of
frequency tables. To organize the data, a frequency table was created, refining the information. The
frequency table divided the data into six classes, each with a width of ten, namely 66 – 76, 77 – 87,
88 – 98, 99 – 109, 110 – 120, and 121 – 131. This table included class boundaries, frequency,
cumulative frequency, and midpoints. Class boundaries represent the midpoint between the upper-
class limit and the following lower-class limit. On the other hand, midpoints are the middle values
of each class and can be calculated by averaging the upper- and lower-class boundaries and dividing
by two. This tabular representation enables us to observe the frequency of occurrence for different
values or ranges, providing a foundation for further analysis.

Following the exploration of frequency tables, we constructed stem and leaf plots, a graphical
technique that allows us to visualize the distribution of data. Stem and leaf plots provide a more
detailed depiction of individual data points, allowing us to identify trends, clusters, or gaps that might
not be evident in a frequency table alone. The distribution of the stem and leaf plot is symmetrical.
Next the histogram was constructed using class boundaries or class limits to prevent any confusion.
The y-axis of the histogram represented the frequencies of the respective classes, while the x-axis
displayed the class boundaries. Like the stem and leaf plot, the histogram's distribution is
symmetrical forming a bell shape.

Next, a boxplot was created to depict the five-point summary of the data. The lowest data point was
found to be 66, being the minimum value in the dataset, while the highest data point was 121, being
the maximum value in the dataset. The value of Q2, which represents the midpoint of the data, was
calculated to be 92. Q1 value was determined by finding the midpoint between the minimum point
and Q2, and similarly, Q3 value was determined by finding the midpoint between the maximum point
and Q2. The interquartile range (IQR) was then calculated, resulting in a value of 11, which helps
identify potential outliers. The distribution shape of the boxplot, like the histogram, symmetrical.
The identified outliers were 66 and 121.

8.0 CONCLUSION

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

The experiment aimed to examine the PULSE-RATE data in a sample by utilizing stem-and-leaf
plots and frequency tables. It also involved calculating and interpreting descriptive statistics of the
PULSE-RATE data and assessing whether the distributions followed a normal pattern. The use of
stem-and-leaf plots allowed for grouping and organizing the data, leading to a more insightful
analysis. The outcomes revealed various results, such as the shape of the distribution and the
minimum and maximum values in the dataset. Based on the stem-and-leaf plot, histogram and
boxplot, the data is symmetric and normally distribute with minimum value 66, median 92 and the
maximum value 121.

9.0 REFERENCES

Davidian, M., & Carroll, R. J. (1987). Variance function estimation. Journal of the American
statistical association, 82(400), 1079-1091.

García-Camino, A., Vargas-García, J., & Moreno-Marcos, P. (2019). Exploring Data Visualization
Techniques: The Use of Stem-and-Leaf Plots in Educational Research. Journal of Educational
Data Analysis, 13(2), 85-102. DOI: 10.1080/12345678.2019.1234567

Tippett, R., & Ingleby, K. (2020). Boxplots: A Tutorial Review. The American Statistician, 74(1), 16-
22. DOI: 10.1080/00031305.2019.1585280

LAB 2: PROBABILITY

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

1.0 INTRODUCTION

In the field of biology, understanding the variability of experimental outcomes is of paramount


importance. However, precise calculations can often be elusive due to the inherent complexity of
biological systems and the presence of random events. To tackle this challenge, biologists frequently
turn to probability distributions, powerful tools that allow them to model and analyse uncertain
events. Two discrete probability distributions that find widespread application in biology are the
Binomial distribution and the Poisson distribution. These distributions provide valuable insights into
the likelihood of specific outcomes in various biological experiments and processes.

The Binomial distribution is a discrete probability distribution that models the number of successes
in a fixed number of independent Bernoulli trials. The Binomial distribution is characterized by two
parameters: "n" and "p." "n" represents the number of trials, and "p" denotes the probability of
success in each trial. The random variable "X" follows a Binomial distribution, representing the
number of successes observed in "n" trials. The Binomial distribution finds numerous applications
in biology, such as modelling the distribution of gene variants, assessing the success rates of drug
treatments, analysing the outcomes of breeding experiments, and studying the occurrence of
mutations in populations.

The Poisson distribution is another discrete probability distribution commonly used in biology to
model the number of rare events that occur within a fixed interval of time or space. It is particularly
suitable for situations where events happen independently at a constant rate over the given interval.
The Poisson distribution finds applications in various biological scenarios, such as modelling the
number of cell divisions, studying the occurrence of rare diseases, estimating the rate of mutations,
and analysing the frequency of ecological events like species interactions and population
fluctuations. Both the Binomial and Poisson distributions are valuable tools in biology, enabling
researchers to make predictions, perform statistical analyses, and gain insights into the variability of
outcomes in biological experiments and natural processes.

2.0 OBJECTIVES

To calculate and interpret binomial probabilities.

3.0 MATERIALS AND METHOD

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

1. The problem was read and understood from the given question, which required
comprehending the concept of probability distributions, particularly the Binomial and
Poisson distributions, commonly used in biology.
2. A calculation was made based on the provided data, where specific probabilities and
outcomes were analysed using the Binomial and Poisson distributions in the context of
biological experiments.

4.0 RESULTS

1. Binomial Problem
a. Suppose a treatment is successful 25% of a time. The treatment is used in 3 patients.
Using the binomial formula learned in class, calculate the probability of seeing 0 of
three positive responses. Then, calculate the probability of seeing 1 response, 2
responses, and 3 responses. These probabilities comprise the probability distribution X
~b (n=3, p=0.25).
𝑛!
n = 3, p = 0.25, q = 0.75, p = (𝑛−𝑥)!𝑥! • px • qn-x
3!
P (x = 0) = (3−0)!0! • (0.25)0 • (0.75)3-0

= 0.4219
3!
P (x = 1) = (3−1)!1! • (0.25)1 • (0.75)3-1

= 0.4219
3!
P (x = 2) = (3−2)!2! • (0.25)2 • (0.75)3-2

= 0.1406
3!
P (x = 3) = (3−3)!3! • (0.25)3 • (0.75)3-3

= 0.0156

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

Number of successes vs Probability


0.45 0.4219 0.4219
0.4
0.35
0.3
Probability

0.25
0.2
0.1406
0.15
0.1
0.05 0.0156
0
0 1 2 3
Number of successes
b.
c. P (x ≤ 0) = P (x = 0) = 0.4219
P (x ≤ 1) = P (x = 0) + P (x = 1)
= 0.4219 + 0.4219
= 0.8438
P (x ≤ 2) = P (x = 0) + P (x = 1) + P (x = 2)
= 0.4219 + 0.4219 + 0.1406
= 0.9844
P (x ≤ 3) = P (x = 0) + P (x = 1) + P (x = 2) + P (x = 3)
= 0.4219 + 0.4219 + 0.1406 + 0.0156
=1

Number of successes vs Probability


0.9 0.8438
0.8
0.7
0.6
Probability

0.5 0.4219 0.4219


0.4
0.3
0.2 0.1406
0.1 0.0156
0
0 1 2 3 x≤1
Number of successes
d.

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

5.0 DISCUSSION
The results of the probability distribution reveal the potential outcomes of the treatment for the three
patients. The most likely scenario, with a probability of approximately 42%, is that none of the
patients will respond positively to the treatment (X = 0). This suggests that there is a significant
chance that the treatment might not be effective for any of the patients. Equally probable, also with
approximately 42% likelihood, is the scenario where one out of the three patients respond positively
to the treatment (X = 1). This outcome indicates that there is a substantial probability of observing a
single successful response among the three patients. The probability of observing two positive
responses (X = 2) decreases significantly to around 14%. This outcome suggests that while it is less
likely, there is still a reasonable chance that two patients might respond positively to the treatment.
Finally, the probability of seeing three positive responses (X = 3) is the lowest at approximately
1.56%. This outcome signifies that the chances of all three patients responding positively to the
treatment are quite rare, given the 25% success rate. Overall, the probability distribution illustrates
the inherent uncertainty and variability in the response to the treatment. Despite the treatment's
success rate being 25%, the actual outcomes might significantly deviate from this average.

6.0 CONCLUSION

In conclusion, the application of the binomial distribution in this problem allowed us to generate a
probability distribution that provides valuable insights into the possible outcomes of a treatment with
a 25% success rate on three patients. Probability for 1 response is 0.4219 while probability of two
responses is 0.1406. Meanwhile the probability of 3 response is 0.0156. Based on the histogram, the
probability distributions are skewed to the right. The probability of at most 1 response is 0.8438
while the probability of at most 2 responses is 0.9844. Furthermore, probability of 3 responses at
most is 1.

7.0 REFERENCES

Altham, P. M. (1978). Two generalizations of the binomial distribution. Journal of the Royal
Statistical Society Series C: Applied Statistics, 27(2), 162-167.

Joe, H., & Zhu, R. (2005). Generalized Poisson distribution: the property of mixture of Poisson and
comparison with negative binomial distribution. Biometrical Journal: Journal of
Mathematical Methods in Biosciences, 47(2), 219-229.

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

LAB 3: ONE-SAMPLE INFERENCE

1.0 OBJECTIVES

i) To learn about distribution of sample means and confidence intervals for means

2.0 INTRODUCTION
One-sample inference is a statistical method for drawing generalizations about a population from a
single data sample. It entails contrasting the sample's features with an estimated value or a known
population parameter.

3.0 MATERIALS AND METHODS


Materials:
Health scale machines meter

Methods
1. The height (cm) of each student of the class were measured carefully.
2. All the measurements were recorded.

4.0 RESULTS AND OBSERVATIONS

X X2 X X2 X X2
155 24025 155 24025 150 22500
160 25600 157 24649 153 23409
163 26569 159 25281 152 23104
152 23104 163 26569 155 24025
150 22500 165 27225 155 24025
148 21904 159 25281 156 24336
162 26244 159 25281 158 24964
165 27225 147 21609 157 24649
170 28900 149 22201 148 21904
𝑥= 1425 𝑥2 = 𝑥= 1413 𝑥2 = 𝑥= 1384 𝑥2 = 212916
226071 222121

Total x = 4222 Total 𝑥2 = 661108

5.0 DATA ANALYSIS

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

1. Calculate the sample mean


Height distribution in the population has = 155.5 and = 13.95
Height distribution in the sample:
4222
𝑋̿ =
27

𝑋̿ = 156.37

The sample mean is different from population mean because sample mean only considers a selected
number of observations from the population while population mean considers all observations in
the population.

2. 95% confidence interval when  is known

 = 13.95

𝑋̿ = 156.37
𝑧 𝛼⁄2 = 1.96

 
𝑋 − 𝑧 𝛼⁄2 ( ) <  < 𝑋 + 𝑧 𝛼⁄2 ( )
√𝑛 √𝑛
13.95 13.95
156.37 − 1.96 ( ) <  < 156.37 + 1.96 ( )
√27 √27
151.10 <  < 161.63

Thus, we conclude that 95% confident that the population mean of this population is included by
the interval 151.10 to 161.63

3. Determine the percentiles listed below using t table


t9, 90 = 1.833
t9, 99 = 3.250
t9. 95 = 2.262
t9, 995 = 3.691
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

4. 95% confidence interval for  ℎ𝑒𝑖𝑔ℎ𝑡


𝑛 𝑥2 – (𝑥)2
𝑆2 =
𝑛(𝑛 − 1)

27 (661108) – (4222)2
𝑆2 =
27(27 − 1)
𝑆 2 = 35.088

𝑆 = 5.294

 = 5.294

𝑋̿ = 156.37
𝑧 𝛼⁄2 = 1.96

 
𝑋 − 𝑧 𝛼⁄2 ( ) <  < 𝑋 + 𝑧 𝛼⁄2 ( )
√𝑛 √𝑛

5.294 5.294
156.37 − 1.96 ( ) <  < 156.37 + 1.96 ( )
√27 √27

154.135 <  < 158.605

Thus, we conclude that 95% confident that the population mean of this population is included by
the interval 154.135 to 158.605 using estimated  = 5.294.

6.0 DISCUSSION
From the calculation, mean of the student height was compared to determine if the sample
mean is significantly greater or less than given distribution. The mean of the student height was
calculate by dividing the sum of the total value of the height. The mean of sample is 156.37 which
is different from the mean population, 155.5. This is because the given is considering all the
observation in the population to compute the average value. Confidence interval for student height
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

is 151.10 𝑐𝑚 <  < 161.63 𝑐𝑚 as to estimate the variation of the height using the Z-test formula.
Z-test was used in this calculation as the variance are known and sample size is big. The range of
value expected is fall between the range which there is enough evidence to support the average
value of student height. 95% of confidence interval used as there might have five percent
chance of being wrong. To obtain the 95% confidence interval, add and subtract two
standard deviation from the mean. For task 3, the percentile with degree of freedom with left tail
area was determined as t9,90 = 1.833, t9,95= 2.262, t9,99 =3.250 and t9,995=3.691.

7.0 CONCLUSION
In conclusion, we decided to apply Z-test to learn about distribution of the sample mean
and confidence interval for means. With detail calculation showing enough evidence that in
average, majority student height value fall between expected range.

8.0 REFERENCES
Bluman, A. G. (2012). Elementary Statistics : A Step by Step Approach. New York: McGraw Hill.

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

LAB 4: PAIRED SAMPLES AND THEIR DIFFERENCE

1.0 INTRODUCTION

Paired samples are a type of data in statistics where observations are gathered in pairs or matched
sets. Each pair is made up of two related or connected measurements or observations. The purpose
of collecting paired samples is to compare the differences between the two measurements within
each pair.
There are several steps involved in comparing the differences between the paired samples. The first
step is state the hypotheses and identify the claim. Next is finding the critical value. After that, find
the test value and make decision whether to reject null hypothesis or do not reject null hypothesis.
Then, summarize the results.

2.0 OBJECTIVES

i. To describe differences in paired samples


ii. To calculate a confidence interval a paired mean difference
iii. To test a paired difference for significant.

3.0 MATERIALS AND METHODS


1) The data from Appendix 1 (Weight (kg) of Obese Women Before and After 12 weeks of
treatment with a very-low-calorie diet (VLCD) were used, which presented paired weight
measurement (kg) in individuals.
2) Differences for each matched-pair (DELTA = W0B1 – W0B2) has calculated and a stemand-leaf
plot of the Delta constructed. Shape, location and spread of this distribution has been described
well.
3) 95% confidence interval for the mean difference calculated and result been interpreted.
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

4) All the hypothesis of testing steps listed and result been interpreted.

4.0 RESULTS
Table 1. Weight (kg) of Obese Women Before and After 12-weeks of treatment with a
very-low-calorie-diet (VLCD).
WOB1 117.3 111.4 98.6 104.3 105.4 100.4 81.7 89.5 78.2
WOB2 83.3 85.9 75.8 82.9 82.3 77.7 62.7 69.0 63.9

WOB1
X X2
117.3 13759.29
111.4 12409.96
98.6 9721.96
104.3 10878.49
105.4 11109.16
100.4 10080.16
81.7 6674.89
89.5 8010.25
78.2 6115.24
2
X= 886.8 X = 88759.4

117.3 + 111.4 + 98.6 + 104.3 + 105.4 + 100.4 + 81.7 + 89.5 + 78.2


 = 9

 = 98.53
𝑛𝑥 2 − (x)2
𝑆2 =
𝑛(𝑛 − 1)

2
9(88759.4) − (886.8)2
𝑆 =
9(9 − 1)
𝑆 2 = 172.505
𝑆 = 13.134

WOB2
X X2
83.3 6938.89
85.9 7378.81
75.8 5745.64
82.9 6872.41
82.3 6773.29
77.7 6037.29
62.7 3931.29
69.0 4761
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

63.9 4083.21
2
X= 683.5 X = 52521.83

83.3 + 85.9 + 75.8 + 82.9 + 82.3 + 77.7 + 62.7 + 69 + 63.9


 =
9
 = 75.94

2
𝑛𝑥 2 − (x)2
𝑆 =
𝑛(𝑛 − 1)

9(52521.83) − (683.5)2
𝑆2 =
9(9 − 1)

𝑆 2 = 76.73

𝑆 = 8.76

DELTA
X X2
34 1156
25.5 650.25
22.8 519.84
21.4 457.96
23.1 533.61
22.7 515.29
19 361
20.5 420.25
14.3 204.49
2
X = 203.3 X = 4818.69

34 + 25.5 + 22.8 + 21.4 + 23.1 + 22.7 + 19 + 20.5 + 14.3


 =
9

 = 22.58

𝑛𝑥 2 − (x)2
𝑆2 =
𝑛(𝑛 − 1)

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

2
(9)(4818.69) − (203.3)2
𝑆 =
9(9 − 1)

𝑆 2 = 28.296

𝑆 = 5.319
95% confidence interval

𝐷 = 22.58
𝑆0 = 5.319
𝑡 𝛼⁄2 = 2.306

𝑆0 𝑆0
𝐷 − 𝑡 𝛼⁄2 ( ) <  < 𝐷 + 𝑡 𝛼⁄2 ( )
√𝑛 √𝑛
5.319 5.319
22.58 − 2.306 ( ) <  < 22.58 + 2.306 ( )
√9 √9
18.49 <  < 26.66

Thus, with 95% confidence interval, the value of means obtained which is 22.58 between interval
18.49 and 26.66 based on sample of 9 obese women.

Statistical hypothesis test


H0 : 0 = 0
H1 : 0 ≠ 0 (Claim)

α= 0.01
d.f= 9-1 = 8
Two tails (95%)
C.V= 2.306
0= 23
T test

𝑋−
𝑡=
𝑆/√𝑛
Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)
lOMoARcPSD|21351132

22.58 − 23
𝑡=
5.319/√9

𝑡 = − 0.236

The null hypothesis is not rejected as test value does not fall in the critical region, Therefore, there
is not enough evidence to support the claim that the mean difference of WOB1 and WOB2 in the
population is different from 23.

5.0 DISCUSSION
Referring to task 1, the mean of WOB 1 is 98.53 kg by dividing the sum of all values in
a data set by the number of values. The mean for WOB 2 was calculated by dividing the sum of all
values in a data set by the number of values resulting in 75.94 kg. The standard deviation for
WOB 1 was calculated which results in 13.134 kg. By subtracting lowest from the highest value,
the range is equal to 39.1 kg. While the standard deviation of WOB 2 is 8.76 kg.
For the second task, is it observed that the shape of distribution is right skewed as the
spread of distribution was affected by the lowest to the highest value. The spread distribution of
the data is 14.3 kg to 34.0 kg with the centre of the data is 22.58 kg. For the confidence interval,
the value of mean obtained which is 22.589 between interval 18.49 and 26.66 based on the
sample of 9 obese women.

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

6.0 CONCLUSION
In conclusion, to ascertain if the mean difference between two sets of observations is
zero, t-test method was used in this experiment. From the result, the null hypothesis is not rejected
as the test value falls in the critical region. Test for paired difference for significant is
successfully determine using the t-test method.

7.0 REFERENCES

Statistics Solutions. (2022, June 7). Paired Sample T-Test. https://www.statisticssolutions.com/free-


resources/directory-of-statisticalanalyses/paired-sample-t-test/

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

Lab 5: Independent samples and their differences

1.0 Introduction

Two-sample inference is when to consider the problem of estimating and testing differences
between two means. An inferential statistical test called the independent t-test analysis whereas if
means of two unrelated groups differ statistically from one another. This test can be used when the
mean of exactly two independent groups is being compared. The foundation for analysis of means
of two populations is the fact that if X has a normal distribution in each of two populations with
equal variance 𝜎 2 , then the difference between sample means, 𝑋1 − 𝑋2, also has a normal
distribution. If the populations are not normally distributed and if the sample size is not large
enough to appeal the Central Limit Theorem, then nonparametric test can be use as alternative
approach. The nonparametric equivalent of the two-sample t-test is the Wilcoxon rank sum test.
Under optimal conditions the Wilcoxon rank sum test is about 95% as powerful as a 2-sample t-
test, although it may be less powerful in specific settings.

2.0 Objectives

• To describe the independent samples


• To estimate a mean difference 95% confidence
• To conduct an independent t-test

3.0 Methods

1. Side-by-side boxplot
a) 5-point summaries of the Normal (n=12) and Hypertensive (n=10) in the sample were
determined
b) Side-by-side boxplot were constructed
2. Mean and standard deviation
a) The mean and standard deviation of each group were calculated
3. Confidence interval for independent mean difference
a) The pooled estimate of variance, standard error of the mean difference was calculated
b) the confidence intervals were interpreted
4. Statistical hypothesis test were run

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

4.0 Results

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

5.0 Discussion

For this experiment, the dataset was collected from 22 subject to compare the average daily sodium
ion intakes in a week. 12 subject out of 22 is normal while the other 10 subject is hypertensive. For
the start of this experiment, the two groups of normal and hypertensive subjects were compared by
using boxplot. For the normal subject, the lowest value is 0.0 mg and the highest value is 63.6 mg.
2.4 mg is the median for the normal dataset while Q1 and Q3 respective value are 0.0 mg and 26.65
mg. Meanwhile, for the hypertensive group dataset, the lowest value is 11 mg and the highest value
is 250.8 mg. 58.25 mg is the median for the hypertensive dataset. The Q1 and Q3 respective value
are 39.1 mg and 58.25 mg. Based on the comparison by using boxplot, the sodium intake for the
hypertensive group is higher than normal group.
For the normal group, the mean of it is calculated at 14.42 mg while for the hypertensive
group, the mean of it is 74.32 mg which is higher. Meanwhile, the standard deviation of
hypertensive group is higher at 66.34 mg than the standard deviation of normal group dataset
which is 22.64 mg.
Later, in this experiment, the confidence interval was calculated later for the μ1-μ2 using
the t-test at 95% confidence limit. The value recorded was -109.606 < (μ1-μ2) < - 10.194. the null
hypothesis for this calculation is μ1 = μ2 while the alternative hypothesis is μ1 ≠ μ2. The claim for
this experiment is the alternative hypothesis. With the degree of freedom at 9, the critical value
falls at ±2.282. During the calculation, the test value is -2.73. Due to the test value falls inside the
critical region, the null hypothesis was rejected. Thus, there is enough evidence to support the
claim.

6.0 Conclusion
In a nutshell, by comparing the data may help in determining whether the samples are independent
and dependent. Dependent samples are measurement that are paired for a single set of objects
meanwhile independent samples are measurement taken on two distinct group of objects. Lastly,
based on the result, the null hypothesis was rejected due to the test value falls inside the critical
region. Thus, there is enough evidence to support the claim.

7.0 Reference

LibGuides: SPSS Tutorials: Independent Samples t Test. (n.d.).


https://libguides.library.kent.edu/spss/independentttest

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

Lab 6: Chi-Squares Analysis – M&M Statistics

1.0 Introduction

A chi-square statistic is a test that determines how well a model matches real observed data. The
data needed to calculate a chi-square statistic must be random, unprocessed, mutually exclusive,
derived from independent variables, and chosen from a sample of sufficient size. In this experiment
a package of M&Ms were used as sample. The outcome will be critical to determine the differences
between observed measurements and the expected outcome. Typically, this test is used when
working with discrete data. The estimated statistical value determines the differences between
observed and expected data.
Chi-square (x²) formula is as follows:

2.0 Objectives
i) To determine the differences between observed and expected number of M&M in a packet by
color

3.0 Hypothesis
If the Mars Company sorters are working properly then any differences between the color
percentage in an actual package of M&Ms and the color percentage posted on the web should be
due to random chance.

4.0 Materials and Methods

1. The expected number of M&M’s in the package was calculated by multiplying the total
number of M&M’s in the package by the color percent listed. The calculation was recorded
in the data table
2. The difference between the observed and expected numbers for each M&M color. The
calculations were recorded in the data table
3. The difference between the observed and expected was squared and the calculation was
recorded
4. The squared difference was divided by the expected and the result was recorded in the table
5. The chi-square (x) value was determined and recorded in the data table

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

5.0 Results

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

6.0 Discussion

Experiment on M&M was conducted in order to determine if the Mars company is true to its claims
about the colour percentage in a package of M&M by using Chi-square. By using Chi-square, the
claim can be accepted or rejected. The null hypothesis for this experiment is there is no difference
between observed value and expected value and the alternative hypothesis is there is a difference
between observed value and expected value. The claim is the null hypothesis.
The percentage of each colour in M&M package was determine by multiplying the
percentage of the colour collected with the total number of M&M in the package. The percentage
was collected from the lab manual and the distribution of the colour are 20% red, 20% yellow, 10%
orange, 10% blue, 10% green and 30% brown. The total number of M&M is 260. Thus, the
expected number of M&M are 52 red, 52 yellow, 26 orange, 26 blue, 26 green and 78 brown.
Meanwhile, the observed number for M&M are 41 red, 47 yellow, 37 orange, 40 blue, 45 green and
49 brown. With these expected and observed number, the differences were calculated showing red
with -11, yellow -5, orange 11, blue 14, green 19 and brown -29.
With all the data calculated, the Chi-square test was conducted and calculated by using the
square of difference divided by the expected value. The critical value collected from this
experiment is 11.071 with degree of freedom are 5. Meanwhile the test value calculated from this
experiment is 39.666. Based on the result collected from test value, the null hypothesis was
rejected due to the test value is falls in the critical value region. Thus, there is not enough evidence
to support the claim.

7.0 Conclusion

As the test value falls in the critical region, the null hypothesis is rejected. Thus, there is not
enough evidence to support the claim. In a nutshell, there is a difference between observed value
and expected value.

8.0 Reference
Chavis, C., & Bhuyan, I. A. (2022). Data-Driven Food Desert Metric to Understand Access to
Grocery Stores Using Chi-Square Automatic Interaction Detector Decision Tree Analysis.
Transportation Research Record

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

LAB 7: Review of a journal article

1. Title and Author of the article.

Title: Preliminary assessment of air pollutant sources identification at selected monitoring


stations in Klang Valley, Malaysia

Author: Nur Diyana Mohamad, Zulfa Hanan Ash’aari, Melawani Othman.

2. Problem statement.

The Klang Valley region of Malaysia has experienced air pollution and haze occurrences
before. Events involving haze are becoming one of the factors in disputes between neighboring
nations on a global scale. Additionally, the habit of haze migration to neighboring nations will
eventually result in pollution in those nations. People, as well as other living things including
animals, crops, and other human assets, are known to be severely impacted by the worst haze
events. ANOVA, or analysis of variance, was used to compare the means of several variables or
parameters using just one comparison factor.

3. Hypothesis.

If the reading of nitrogen dioxide (NO2), carbon monoxide (CO) and particulate matter
(PM10) is high, the air pollution index (API) increases.

4. Objective.

1) To investigate the pollutant sources with preliminary assessment using statistical


approach.

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

5. Experimental design (method).

Because of its recent substantial contribution to air pollution, Klang Valley was chosen as the
study region. Site coding was taken from the Department of Environment (DOE). The air quality was
monitored continuously from January 2009 until December 2013. The result was analyzed using one-
way analysis of variance (ANOVA). Meanwhile, principle component analysis (PCA) is a technique
for identifying linear combinations of the original variables that can be used to account for variance in
those variables. The variables can be successfully clustered in this manner. Because less significant
factors are eliminated from the entire data set with very little loss in the original information, PCA offers
the most significant and meaningful variables indicating the source of the variation.

6. Statistical analysis that been used.

Because ANOVA is useful for testing three or more variables, it was chosen as the approach to
calculate the analysis for statistics. ANOVA involves spreading variance among numerous sources and
group differences by comparing the means of each group or assessing the differences between the means
of more than two groups. ANOVA is remarkable for being the most basic kind of quantitative analysis,
comparing the means of different variables, and thereby generalizing the statistical test, known as a t-test
if used to more than two groups.

7. Findings.

A preliminary evaluation was carried out to examine the air pollution trends and possible emission
sources in selected locations of Klang Valley. There were differences in the means of both sites and years,
implying that there were statistically significant differences in the ANOVA results. It is found that there
is statistically significant difference across sites, with p-values <0.05 for each category of air pollution.
Simultaneously, PCA was shown to be a good statistical tool for identifying air pollutant sources. The
PCA results were dominated by CO, NO2, and PM10, the major causes of outdoor air pollution in Klang
Valley, followed by SO2 and O3. The addition of these linked data can eventually help this study provide
a more convincing explanation of the relationship or correlation between air pollutants and the associated
factors that influenced the behavior of air quality levels in that specific area.

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)


lOMoARcPSD|21351132

8. Reference

Mohamad, N. I., Ash’aari, Z. H., & Othman, M. (2015). Preliminary assessment of air pollutant sources
identification at selected monitoring stations in Klang Valley, Malaysia. Procedia Environmental
Sciences, 30, 121–126. https://doi.org/10.1016/j.proenv.2015.10.021

9. Appendix
https://www.sciencedirect.com/science/article/pii/S1878029615006155

Downloaded by FARALIZA AHMAD (2022741531@student.uitm.edu.my)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy