570 - Assignment 02 - Nguyen Ngoc Van Anh
570 - Assignment 02 - Nguyen Ngoc Van Anh
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism.
I understand that making a false declaration is a form of malpractice.
Student Signature
Grading grid
P3 P4 P5 M2 M3 M4 D1 D2 D3
1
Description of activity undertaken
Assessor name:
2
Summative Feedbacks Resubmission Feedbacks
3
Table of Contents
I. Introduction ..................................................................................................................................... 6
II. Analysing and evaluating qualitative and quantitative raw business data from a range of
examples using appropriate statistical methods ................................................................................... 7
1. Differences between quantitative and qualitative raw data analysis ....................................7
2. Descriptive statistics .......................................................................................................... 10
2.1 Measures of Central Tendency ........................................................................................... 10
2.2 Measures of Variability ....................................................................................................... 15
3. Inferential statistics ................................................................................................................... 19
3.1 The differences between Population and Sample .............................................................. 19
3.2 One sample T-test: Estimation and Hypotheses testing ..................................................... 21
3.2.1 The mean Estimation ....................................................................................................... 21
3.2.2 Hypotheses testing of a sample....................................................................................... 23
Two-tailed testing ................................................................................................................... 25
One-tailed testing ................................................................................................................... 26
3.3 Two samples T-test and Independent Sample T-test: Estimation and Hypotheses
testing..... ....................................................................................................................................... 29
3.3.1 Independent Samples ...................................................................................................... 29
Estimation ............................................................................................................................... 29
Hypothesis testing (2-tailed) ................................................................................................... 31
3.3.2 Dependent Samples ......................................................................................................... 37
Estimation ............................................................................................................................... 37
Hypothesis testing................................................................................................................... 37
4. Measuring the association between two variables (from the dataset) .................................. 41
4.1 Correlation analysis ............................................................................................................. 41
4.2 Regression analysis and simple forecasting ........................................................................ 43
Simple linear regression model .............................................................................................. 43
Multiple linear regression model ............................................................................................ 44
5. The differences in application among descriptive, exploratory and confirmatory analysis
techniques in general........................................................................................................................ 49
III. Applying a range of statistical methods used in business planning for quality, inventory and
capacity management ........................................................................................................................... 51
1. Measuring the variability in business processes or quality management............................... 51
4
2. Measuring the probability by using probability distributions to business operations and
processes ........................................................................................................................................... 51
2.1 Normal distribution ............................................................................................................. 52
2.2 Poisson distribution ............................................................................................................. 55
2.3 Binomial distribution ........................................................................................................... 56
2.4 Inference.............................................................................................................................. 57
3. Evaluations and recommendations for improving business planning through statistical
methods above. ................................................................................................................................ 57
IV. Using appropriate charts and tables to communicate findings of given variables .................... 58
1. Analysing data and interpreting results by using frequency distribution tables, graphs, and
charts ................................................................................................................................................. 58
2. The strengths and weaknesses of using different types of charts and tables ......................... 62
3. The most effective way of communicating the results of the analysis ................................... 64
V. Conclusion ..................................................................................................................................... 65
VI. References ..................................................................................................................................... 66
5
I. Introduction
Generally, statistics plays a significant role in every enterprise or organization. It gives all the
economic information, financial or marketing data that every corporation needs to succeed in the
marketplace. This assignment will be discussed about analysing and evaluating raw business data by
utilizing some common of statistical methods. Besides that, after analysing, these methods are also
applied in business planning of a specific company. In addition, all findings and outcomes will be
presented in appropriate graphs and tables.
ABC company is a unit specializing in distributing good quality stationery in Ho Chi Minh. This is a
distribution agent of many famous brands that are now trusted by many customers, besides the
company provides a full range of commonly used items such as pens, brushes, binders, all kinds of
printing paper and photocopying paper, etc. During its operation, ABC company has a relatively
stable revenue for each product, but recently, there are several new competitors in the market lead
to the number of goods on the market is diversified day by day. Therefore, this causes certain
difficulties for ABC company's business, and this is also the reason for this research.
For that reason, as a researcher of analysis, the author will conduct statistics and analysis of ABC
company's sales to find more deeply about business performance from 2019 to 2020 of this
company. Besides, the results of this research will be the premise for the company's upcoming
business plan to improve product quality as well as improve the effectiveness of marketing activities.
This research has profound implications for research analysis, it provides the researcher with an
accurate and comprehensive view of the numbers, data, business context of a company. At the same
time, when conducting the analysis, the data on goods, revenue, and profit will be accurate and
complete statistics. Thereby, the statistical results can support the company's managers to make
new decisions and strategies more effectively.
Secondary research is the methodology of this statistic. In particular, the sales data of company ABC
is obtained from the published source via the website https://data.world/ . Moreover, the data table
includes 9 variables (Order date, Region, Item, Units sold, Unit cost, Unit price, Total revenue, Total
Cost, and Profit), with 50 observations.
6
II. Analysing and evaluating qualitative and quantitative raw business data from a range of
examples using appropriate statistical methods
Quantitative data
Quantitative data are always numbers and can also be count. Furthermore, this is the result of a
population's qualities being counted or measured, and this kind of data might be discrete or
continuous. Moreover, continuous data is further divided into interval data and ratio data. Normally,
researchers prefer to use quantitative data rather than qualitative (categorical) data because it is
easier to analyse mathematically (Holmes, Illowsky and Dean, n.d.).
Qualitative data
Qualitative data is the outcome of categorizing or describing a population's characteristics. Based on
that, categorical data is another name for this type of information, which is generally expressed by
words or letters, and it cannot be counted. Moreover, nominal data and ordinal data are the two
types of qualitative data. Nominal data is used to name or describe variables, whereas ordinal data is
used to scale them (Holmes, Illowsky and Dean, n.d.).
7
Table 1: Example of Quantitative data
(Source: Attached excel file, 2021)
In fact, each type of data analysis also has its own benefits and drawbacks. Therefore, the following
part will be examined these aspects of Quantitative and Qualitative data analysis.
8
Quantitative data analysis Qualitative data analysis
• Larger sample sizes provide for a • Because data is more on an individual
more comprehensive level and may go into further detail.
interpretation of the findings. Therefore, it is used to obtain a deeper
Besides, it allows analysts to knowledge of their thoughts and
establish more assumptions activities to produce or investigate a
about the target audience of the hypothesis in greater depth.
study.
• Encouraging discussion since it is
• Normally, data collected through conducted in a more open method rather
self-completion activities such as than following a predetermined set of
Advantages
online surveys are anonymous, questions. Apart from that, this provides
especially when dealing with context to the investigation instead of
sensitive issues. just data.
• Online and mobile surveys are • Flexibility, in which the interviewer can
faster and easier to conduct and investigate and ask the questions related
gather data, and based on that, to the research topic that they feel are
the analysts can receive findings important, or that they have not
in real-time. considered before throughout the
conversation, as well as vary the setting.
9
• Because the analysts are limited the research topic, rather than a
by the survey's predetermined collection of people with differing
responses, so they are unable to viewpoints. This is more valuable,
delve deeper into the behaviours, especially if they are having a debate
opinions, and causes, as they about different viewpoints throughout
conduct qualitative research. This focus groups.
is especially true when it comes to
self-completion questionnaires
(online).
2. Descriptive statistics
2.1 Measures of Central Tendency
A single value that aims to describe a main feature of a collection the centre position within that data
set is referred to as a measure of central tendency. The mean, mode, and median are all generic
measures of central tendency (Holmes, Illowsky and Dean, n.d.).
Mean
A most common and well-known measure of central tendency is the mean or average. It can be
used with both continuous and discrete data, however continuous data is the most common
(statistics.laerd.com, 2021). In addition, this value is calculated by dividing the total amount of values
in the data set by the quantity of values in the data set, according to the following basical formula:
𝑥1 +𝑥2 +....+𝑥𝑛
𝑥̅ =
𝑛
(Source: statistics.laerd.com, 2021)
Furthermore, one of the most essential properties of the mean is that it calculates every value in the
data set. For example, to calculate the mean value from the database of the variable "Total Revenue"
in the attached excel file, the author used the "Average" function to give the result.
10
(Source: Attached excel file, 2021)
Therefore, the mean value of the variable "Total Revenue" after a calculation is $504.64. It means
that the average total revenue earned by ABC company based on the database is $504.64. And this
value is represented on the following histogram:
11
Mode
The mode refers to the value that occurs most frequently in the database. To put it in another way, it
is the most typical number in a group of data. In statistic, there is no formula mode in mathematics,
and it is typically used for categorical because it just considers the most frequently recurring elements
from the collection (statistics.laerd.com, 2021).
For example, with the data of a variable “Unit Price” from the attached excel file, the author has
used SPSS software to calculate the mode value. There are the mode value’s results of “Unit Price”
variable from SPSS.
12
Mode
Median
The median is the middle value of a list number, in which the database is sorted in descending or
ascending order. Additionally, this value might also be used for interval or ordinal data. When there
are an even number of items in a dataset, the median value can be calculated by taking the average
of the centre two values (byjus.com, 2021). In term of theory, the median value is calculated by the
following formula:
𝑛+1
• Median = 𝑡ℎ (in case n is an odd number) (Holmes, Illowsky and Dean, n.d.).
2
𝑛 𝑛
( ) 𝑡ℎ +( +1) 𝑡ℎ
2 2
• Median = (in case n is an even number) (Holmes, Illowsky and
2
Dean, n.d.).
13
Example 1: The following dataset has an odd number of observations that are organized in
descending order: 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6.
11 + 1
In this case: n = 11 ➔ Median value = 𝑡ℎ = 6th = 13
2
Example 2: The following dataset has an even number of observations that are organized in
descending order: 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 4.
12 12
( ) 𝑡ℎ +( +1) 𝑡ℎ 6𝑡ℎ + 7𝑡ℎ 13 + 12
2 2
In this case: n = 12 ➔ Median value = = = = 12.5
2 2 2
Besides that, in this statistical task of ABC company, the table below also presented the findings of the
median values of the two variables “Unit Sold” and “Unit Price” from the database in the attached
excel file by using SPSS.
Based on this table, the median value of “Unit Sold” variable is 50.50 and the median value of “Unit
Price” variable is 8.99.
14
2.2 Measures of Variability
A summary statistic that indicates the amount of distribution in a database is known as a measure of
variability. In statistics, the range, variance, and standard deviation are the three most used
measures of variability (Holmes, Illowsky and Dean, n.d.). Apart from that, these three aspects will be
examined in detail in the following paragraphs.
Range
The range value is usually the easiest and simplest measure of variability to compute and comprehend.
Besides, the range of a database in statistics is the distance between the highest and lowest value in
the dataset. In other words, the range value is calculated based on a formula in which Maximum
value minus Minimum value (onlinestatbook.com, n.d.).
In this case: Maximum value = 14 and Minimum value = 2 ➔ Range value = 14 – 2 =12
Example 2: In the statistical task of ABC company, the range value of the “Profit” variable of the
database in the attached excel file is considered by using SPSS. And the following table had shown this
result.
15
Variance
In statistics, variance presents a "description" of the distribution in terms of how observations
gather or split from each other. Apart from that, the variance is defined as the average squared
difference of the values from the average by using the mean as the measure of the centre of the
distribution. Moreover, to calculate the variance, the distance between the observation and the
mean is squared and then combined (onlinestatbook.com, n.d.). In addition, variance is an essential
computation when calculating a distribution's standard deviation. There are two formula to calculate
the variance of Sample and Population
2
∑(𝑋 − 𝑋̅)2 2
∑(𝑋𝑖 − 𝜇)2
𝑠 = 𝜎 =
𝑁 −1 𝑁
Example 1: Considering the following sample dataset including 8, 9, 7, 6, 5, to calculate the variance,
the author had to use a sample variance formular.
Example 2: In the statistical task of ABC company, the variance value of the “Unit Cost” variable of the
database in the attached excel file is considered by using SPSS and Excel. Apart from that, the two
following tables had shown this finding.
16
(Source: Attached excel file, 2021)
From results in the above table, the variance of “Unit Cost” variable is 295.395. Besides, a large
variance indicates that the values in the data set are far from the mean value and highly variable, while
a small variance indicates the opposite.
Standard Deviation
With continuous distributions, the standard deviation is an essential measure of variability. In
statistics, the standard deviation is considered as the common voice of fluctuations. In order words,
it enables a better understanding of how much observations deviate from the average as well as
how much general variability a distribution contains. Additionally, particular observations can also
be interpreted in terms of their relationship to the mean by utilizing standard deviations
(scalestatistics.com, 2021).
Standard deviation is calculated by taking the square root of the variance in a distribution
(onlinestatbook.com, n.d.).
17
Sample standard deviation Population standard deviation
𝑠 = √𝑠 2 𝜎 = √𝜎 2
Example 1: Considering the following sample dataset including 8, 9, 7, 6, 5, to calculate the standard
deviation, the author had to use a sample standard deviation formula.
➔ 𝑠 = √𝑠 2 = √2.5 = 1.58
Example 2: With the data of a variable “Unit Cost” from the attached excel file, the author have used
SPSS software to calculate the sample standard deviation.
18
3. Inferential statistics
In general, inferential statistics uses a random sample of the dataset taken from a population to
describe and give conclusions about the population. In other words, inferential statistics enables
researchers to make conclusions (“inferences”) from the dataset. Moreover, analysts can use
inferential statistics to develop generalizations about a population based on data from samples
(statisticshowto.com, 2021)
There are two important aspects of inferential statistics including Estimation and Hypothesis testing
(statisticshowto.com, 2021). Among that, Estimation is divided into Point Estimate and Interval
Estimate. Besides, Hypothesis testing in inferential statistic consist of One-tail test and Two-tail test
(Holmes, Illowsky and Dean, n.d.).
Every researcher should be aware of the difference between population and sample because these
are important concepts in research. The distinction between a specific population and a sample is
simple to grasp.
19
Firstly, the term of population is defined as the sum of all the elements under investigation that share
one or more common properties. In other words, a population is made up of all members of a specific
group, as well as all conceivable outcomes or measurements. Furthermore, a population does not
have to be made up of only people, and it can also contain animals, events, items, structures, and so
on. The specific population will be determined by the study's scope (Surbhi, 2017).
Secondly, the term of simple is understood as a subset of the population chosen at random from
research participants. The sample should be chosen so that it accurately represents the population in
all of its properties and is devoid of bias, resulting in a small cross-section of the population, as the
statistical inferences of a sample are used to make population-wide generalizations (Surbhi, 2017).
Apart from that, while making statistical testing in a large scope, the researcher commonly used
samples rather than populations because it is hard to collect enough data and information from the
whole.
20
3.2 One sample T-test: Estimation and Hypotheses testing
In statistics, the one-sample t-test is a statistical hypothesis test used to examine whether an unknown
population or sample means is different from a specific value (stattrek.com, 2021). In this section, the
mean estimation and hypotheses testing of a single sample will be clarified in detail.
In statistics, an estimate is a value calculated from a sample that is expected to represent the value to
be determined in the population. In which, estimation has two types including point estimate and
interval estimate (Holmes, Illowsky and Dean, n.d.). In this section, the interval estimation will be
considered.
Interval estimation is a way of using sample data to calculate or predict a range of possible values.
Additionally, a confidence interval provides an estimated range of values which is likely to include an
unknown population parameter, and the estimated range being calculated from a particular sample
database (Holmes, Illowsky and Dean, n.d.).
𝜇: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑋̅ : Sample mean
α : Significant level
As 𝝈 known 𝑧𝛼/2 : The z-value of standard
normal distribution
𝜎: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 Standard deviation
n: Sample size
𝜇: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑋̅ : Sample mean
α : Significant level
𝛼/2
As 𝝈 unknown 𝑡(𝑛−1) : The value of the student (t)
probability with n – 1 degree of
freedom
S: Sample standard deviation
n: Sample size
21
Example 1: Finding the income of population mean with 1% significance if a sample of 30 people has
a mean of $750, and the sample standard deviation is $65. Let's calculate the interval estimate of the
population mean.
𝛼/2 𝑠 65
Applying in the formula: 𝜇 = 𝑥̅ ± 𝑡(𝑛−1) ∗ = 750 ± 2.756 ∗ = 750 ± 32.71
√𝑛 √30
Conclusion: The income of population mean with 1% significance is between the range of $717.29 and
$782.1.
Example 2: In the statistical task of ABC company, the author estimates the population mean of the
"Profit" variable by using excel based on the database in the attached excel file at the 95% level of
confidence (significant value = 0.05).
The table above showed findings: with the confidence level of 95%, the mean sample is 263.70 and
the Error of the interval estimate of population mean is 34.52. Based on that, it can be calculated the
Lower limit value is 229.18 and the Upper limit value is 298.21. Therefore, the conclusion of this result
22
is that the population mean of the “Profit” variable is estimated between the range of 229.18 and
298.21 at 95% confidence.
A statistician will make judgments about their findings based on a process known as " hypothesis
testing" in statistics. A hypothesis test entails gathering and analyzing data from a sample. After that,
the statistician gets to decide whether there is enough evidence to reject the null hypothesis based
on data analyses. Additionally, the actual test starts with two hypotheses being considered, which are
the null hypothesis (H0) and the alternative hypothesis (Ha). These two hypotheses include the
opposite points of view (Holmes, Illowsky and Dean, n.d.).
• H0: The null hypothesis which is understood as no distinction between a sample mean or
proportion and a population mean or proportion, according to this assertion. In other words,
the difference is zero (Holmes, Illowsky and Dean, n.d.).
• Ha: The alternative hypothesis is defined as a claim about the population that is contradictory
to H0 and what analysts assume when they reject the H0 (Holmes, Illowsky and Dean, n.d.).
The following table shows the different hypotheses in the relevant pairs. In terms of theory, especially,
the H0 is always the one that has the equal (=) sign.
Step 1: Specifying the Null Hypothesis (H0) and the Alternative Hypothesis (Ha)
Step 3: Collecting the sample data and compute the value of the test statistic.
23
Step 4: Using the level of significance to determine the critical value and the rejection rule.
Step 5: Using the value of the test statistic and the rejection rule to determine whether to
reject Ho or not (nedarc.org, 2019).
Furthermore, a hypothesis test about the value of a population mean 𝜇 must typically take one of the
three forms listed below.
For the hypotheses testing of a sample, because this case is not possible to determine the standard
deviation of the population of this sample (for some reason such as the population size being too
large leading to the inability to the ability to collect sufficient data), statisticians have to use the t-
value rather than the z-value to test hypotheses H0 and Ha.
𝜇 = 𝜇𝑜 𝜇 ≠ 𝜇𝑜 𝛼/2
|𝑡| ≥ 𝑡𝑛−1
(𝑥̅ − 𝜇𝑜 )√𝑛
𝑡= 𝛼
𝜇 ≤ 𝜇𝑜 𝜇 > 𝜇𝑜 𝑠 𝑡 ≥ 𝑡𝑛−1
𝛼
𝜇 ≥ 𝜇𝑜 𝜇 < 𝜇𝑜 𝑡 ≤ 𝑡𝑛−1
Where
𝜇𝑜 : The value of a constant
𝑥̅ : The mean of the sample data
n : Sample size
s : Sample standard deviation
𝛼/2 𝛼
𝑡𝑛−1 𝑜𝑟 𝑡𝑛−1 : The value of student (t) distribution table with degree of freedom DF = n -1
24
Two-tailed testing
Example 1: The "Ao Dai" designer has actually assumed that: The mean height of adult females is
equal to 161 cm. In addition, the evidence proof is that a sample of 41 adult females had an average
height of 165 cm and the standard deviation is known to be 12 cm. Let's test the claim of the Ao dai
designer is right or wrong at the 5% level of significance.
Step 1: Assuming
The mean height of adult females is equal to 161 cm
H0
➔ 𝜇 = 𝜇𝑜 = 161 𝑐𝑚
The mean height of adult females is NOT equal to 161 cm
Ha
➔ 𝜇 ≠ 𝜇𝑜 = 161 𝑐𝑚
Example 2: In the statistical task of ABC company, the author compares the mean of the "Unit cost"
variable of the database in the attached excel file with the test value is 20 at the 95% level of
confidence (significant value = 0.05).
Based on that: H0 is assumed that the mean of the "Unit cost" variable is equal to 20.
Ha is assumed that the mean of the “Unit cost” variable is NOT equal to 20.
25
(Source: attached spv file, 2021)
• Based on the mean = 9.076 in the One-Sample Statistics table, it is presented that the mean
value of the "Unit cost" variable is less than 20 (the t-test value of the mean of the “Unit cost”
variable t = -4.494 corresponding to a significance level of 0.000 < 0.05).
• Apart from that, it is concluded that hypothesis Ha is accepted, which means that the mean of
the “Unit cost” variable is NOT equal to 20.
One-tailed testing
Example 1: A sample of 30 people has an average weight of 45 Kg and a standard deviation of 5 Kg.
Let's test the claim whether the sample mean is less than or equal to 42 Kg at the 95% level of
confidence.
26
Step 1: Assuming
The sample mean of people is less than or equal to 42 kg
H0
➔ 𝜇 ≤ 𝜇𝑜 = 42 𝑘𝑔
The sample mean of people is greater than 42 kg
Ha
➔ 𝜇 > 𝜇𝑜 = 42 𝑘𝑔
Step 5: Conclusion: Reject H0 (or Accept Ha), which means the sample mean of people is greater
than 42 kg.
Example 2: In the statistical task of ABC company, the author tested the mean of "Unit Sold" variable
of database in the attached excel file that is less than or equal to $51 at 95% level of confidence
(significant value = 0.05) and 99% level of confidence (significant value = 0.01) by using t-value and p-
value test of statistic.
Based on that: H0 is assumed that the mean of the "Unit sold" variable is less than or equal to $51.
Ha is assumed that the mean of the “Unit cost” variable is greater than $51.
27
(Source: Attached excel file, 2021)
After testing H0 by excel, the above table presented findings:
• t-multiple at 5% level of significant = -1.6766
• t-multiple at 1% level of significant = -2.4049
• t-value test = - 0.6454
• p-value test = 0.7392
Comparision:
By using t-value
• At 5% level of significant: |𝑡 − 𝑣𝑎𝑙𝑢𝑒| = |−0.6454| = 0.6454 < |𝑡 − 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒| =
|−1.6766| = 1.6766 ➔ Accept H0
• At 1% level of significant: |𝑡 − 𝑣𝑎𝑙𝑢𝑒| = |−0.6454| = 0.6454 < |𝑡 − 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒| =
|−2.4049| = 2.4049 ➔ Accept H0
By using p-value
• At 5% level of significant: p-value = 0.7392 > alpha = 0.05 ➔ Accept H0
• At 1% level of significant: p-va;ue = 0.7392 > alpha = 0.01 ➔ Accept H0
28
Conclusion: it is concluded that hypothesis H0 is accepted, and Ha is rejected, which means that the
mean of the “Unit sold” variable is less than or equal to $51.
3.3 Two samples T-test and Independent Sample T-test: Estimation and Hypotheses
testing.
3.3.1 Independent Samples
❖ Estimation
In order to estimate a difference of population means by using Independent Samples, the “t
distribution” are used because of population standard deviations are not available (assume that the
population distributions are Normal) (Holmes, Illowsky and Dean, n.d.). Besides that, in statistic, there
are two situations which differ in the formulas that need to be used:
To check whether the two samples are similar or different, a "rough rule" will be used. Divide the
larger Standard Deviation by the smaller. If the result is < 1.5 then they are usually similar and vice
versa (Holmes, Illowsky and Dean, n.d.).
Example of Situation 1: Let’s estimate the difference between the mean weight of male and female
people at 95% level of confidence (𝜶 = 𝟎. 𝟎𝟓)
Male Female
Mean 𝑥̅1 = 60 kg 𝑥̅2 = 55 kg
Standard deviation 𝑠1 = 4 kg 𝑠2 = 3 kg
Sample size 𝑛1 = 19 𝑛2 = 21
In this case 𝑠1 ∶ 𝑠2 = 4 ∶ 3 = 1.33 < 1.5 ➔ the Standard Diveations are SIMILAR
Step 1: Degree of freedom (DF) when the standard deviations are similar:
DF = 𝑛1 + 𝑛2 − 2 = 19 + 21 – 2 = 38
𝛼/2 0.025
Step 2: Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡38 = 2.024
29
Step 3: Similar variance of the two independent samples:
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 (19 − 1)42 + (21 − 1)32
𝑆𝑝2 = = = 12.315
𝑛1 + 𝑛2 − 2 38
Step 4: Difference of Population Means:
𝛼/2 1 1
𝜇1 − 𝜇2 = 𝑥̅1 − 𝑥̅2 ± 𝑡𝐷𝐹 . √𝑆𝑝2 ( + )
𝑛1 𝑛2
1 1
➔𝜇1 − 𝜇2 = 60 − 55 ± 2.024 × √12.315 ( + ) = 5 ± 2.25 = (2.75 ; 7.25)
19 21
Step 5: Conclusion.
We are 95% confident that the difference in male and female average weight is 5 ± 2.25 = (2.75 ;
7.25). It means on average of weight, males are heavier than females in the range from 2.75 to 7.25
kg according to the above stastistical result.
Example of Situation 2: Let’s estimate the difference of the mean GPA score between female and
male students at 99% level of confidence (𝜶 = 𝟎. 𝟎𝟏)
Female Male
Mean 𝑥̅1 = 6.3 𝑥̅2 = 5.8
Standard deviation 𝑠1 = 3 𝑠2 = 1.8
Sample size 𝑛1 = 25 𝑛2 = 15
In this case 𝑠1 ∶ 𝑠2 = 3 ∶ 1.8 = 1.66 > 1.5 ➔ the Standard Diveations are DIFFERENT
Step 1: Degree of freedom (DF) when the standard deviations are different.
2
𝑠 2 𝑠22 32 1.82
2
( 1 + ) ( + )
𝑛1 𝑛2 25 15
DF = 2 2 = 2 2 = 37.99 ≈ 38
𝑠12 𝑠22 32 1.82
(𝑛 ) (𝑛 ) (25) ( 15 )
1 2
+ +
𝑛1 − 1 𝑛2 − 1 25 − 1 15 − 1
𝛼/2 0.005
Step 2: Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡38 = 2.712
30
Step 3: Difference of Population Means:
32 1.8 2
➔ 𝜇1 − 𝜇2 = 6.3 − 5.8 ± 2.712 × √ ( + ) = 0.5 ± 2.06 = (-1.56 ; 2.56)
25 15
Step 4: Conclusion:
We are 99% confident that the difference in female and male students’ average GPA score is 0.5 ±
2.06 = (-1.56 ; 2.56). It means on average of GPA score, female students is higher than male
students in the range from -1.56 to 2.56 point according to the above stastistical result.
(𝑥̅1 − 𝑥̅ 2 ) − (𝜇1 − 𝜇2 )
Formula of t-value test: 𝑡 =
𝑠 2
𝑠 2
√ 1 + 1
𝑛1 𝑛2
Example 1: Based on the following information, let’s test the hypothesis that there is a difference
between the mean weight of boys and girls from 10 to 15 year olds at the 5% significance level (𝜶 =
𝟎. 𝟎𝟓).
Boys Girls
Mean 𝑥̅1 = 28 𝑥̅2 = 23
Standard deviation 𝑠1 = 4 𝑠2 = 2
Sample size 𝑛1 = 30 𝑛2 = 28
31
In this case 𝑠1 ∶ 𝑠2 = 4 ∶ 2 = 2 > 1.5 ➔ the Standard Diveations are DIFFERENT
Step 1: Assuming:
There is NOT different between the mean weight of boys
➔ 𝜇1 − 𝜇2 = 0
There is different between the mean weight of boys and
➔ 𝜇1 − 𝜇2 ≠ 0
Step 3: Degree of freedom (DF) when the standard deviations are DIFFERENT
2
𝑠12 𝑠22 42 22
2
( + ) ( + )
𝑛1 𝑛2 30 28
DF = 2 2 = 2 2 = 43.28 ≈ 43
𝑠12 𝑠22 42 22
(𝑛 ) (𝑛 ) (30) (28)
1 2
+ +
𝑛1 − 1 𝑛2 − 1 30 − 1 28 − 1
𝛼/2 0.025
Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡43 = 2.017
0.025
Step 4: Comparison: t = 6.08 > 𝑡43 = 2.017 ➔ Reject H0
Step 5: Conclusion: Reject H0 (or Accept Ha), which means that there is different between the mean
weight of boys and girls from 10 to 15-year old.
32
If the sig. value of Lavene’s test < alpha = 0.05 ➔ Variance is different, in which:
The Sig. value of t-test < 0.05 There is difference between 2 independent
Situation 1 ➔ Reject H0 variables
The Sig. value of t-test > 0.05 There is no difference between 2 independent
➔ Accept H0 variables
If the sig. value of Lavene’s test > alpha = 0.05 ➔ Variance is similar, in which:
The Sig. value of t-test < 0.05 There is difference between 2 independent
Situation 2 ➔ Reject H0 variables
The Sig. value of t-test > 0.05 There is no difference between 2 independent
➔ Accept H0 variables
Situation 1: Testing the difference between the mean of “Total Revenue” of Central region and West
region based on a data set of ABC company at 95% level of confidence (alpha = 0.05).
Assuming:
H0 There is no difference between the mean of Total Revenue of Central and
West region: 𝜇1 − 𝜇2 = 0
Ha There is difference between the mean of Total Revenue of Central and West
region: 𝜇1 − 𝜇2 ≠ 0
33
Independent Samples Test
Levene's Test
for Equality of
Variances t-test for Equality of Means
95% Confidence
Interval of the
Conclusion: Reject Ho and accept Ha, which means there is difference between the mean of Total
Revenue of Central and West region. In other words, the average of total revenue of Central region is
different from that of West region. It can be seen in the Group statistics table, the mean total revenue
of Central region is 379.962 and of the West region is 203.064.
34
The Means plots of Total revenue variable and Region variable
(Source: attached spv file, 2021)
Situation2: Testing the difference between the mean of “Unit sold” of Central region and West region
based on a data set of ABC company at 95% level of confidence (alpha = 0.05).
Assuming:
H0 There is no difference between the mean of Unit sold of Central and West
region: 𝜇1 − 𝜇2 = 0
Ha There is difference between the mean of Unit sold of Central and West
region: 𝜇1 − 𝜇2 ≠ 0
35
Independent Samples Test
Levene's Test
for Equality of
Variances t-test for Equality of Means
36
3.3.2 Dependent Samples
❖ Estimation
Situation: The two samples are considered as dependent in conditions: Sample sizes are equal and
each member of the first sample is associated with the corresponding member of the second sample.
Example: Using the following information, let’s estimate the mean difference in scores between team
A and team B at 95% level of confidence (𝜶 = 𝟎. 𝟎𝟓).
Team A 23 20 19 21
Team B 20 21 17 19
̅ 2 2 2 2 2
∑(𝑋 − 𝑋) (3 − 1.75) + (−1 − 1.75) + (2 − 1.75) + (3 − 1.75 )
𝑠𝐷 = √ =√ = 1.89
𝑛 −1 4−1
𝑡𝛼/2 𝑆𝐷 1.89
Step 4: 𝜇𝐷 = 𝑥̅𝐷 ± 𝐷𝐹 × √𝑛𝐷
= 1.75 ± 3.182 × √4
= 1.75 ± 3 = (−1.25 ; 4.75)
Step 5: Conclusion:
The mean difference in scores between team A and team B at 95% level of confidence is in the range
from -1.25 to 4.75 points. In other words, the average difference may be as high as 4.75 points (in
favour of team A) and as low as -1.25 (“-” indicates that it could be in favour of team B).
❖ Hypothesis testing
In statistics, the dependent sample t-test compares the mean value of one sample in various
measurements. Because data from one sample must be paired with measurements from the other,
therefore, it is also known as the paired t-test. Furthermore, this is a statistical process for determining
whether the difference in average between two groups of observations is zero. In particular, when the
37
data or cases in one sample are related to the cases in the other sample, the dependent sample t-
test is utilized (statisticssolutions.com, 2021).
𝑥̅ − 𝜇𝐷
Formula of t-value test: 𝑡 = 𝑆𝐷
√𝑛𝐷
Example 1: Using the following information, let’s test the theory that team A on average score higher
than team B at 95% level of confidence (𝜶 = 𝟎. 𝟎𝟓).
Team A 23 20 19 21
Team B 20 21 17 19
Step 1: Assuming
Team A on average score less than or equal to
H0 Team B
➔ 𝜇1 − 𝜇2 = 𝜇𝐷 ≤ 0
Team A on average score higher than Team B
Ha
➔ 𝜇1 − 𝜇2 = 𝜇𝐷 > 0
̅ 2 2 2 2 2
∑(𝑋 − 𝑋) (3 − 1.75) + (−1 − 1.75) + (2 − 1.75) + (3 − 1.75 )
𝑠𝐷 = √ =√ = 1.89
𝑛 −1 4−1
38
0.05
Step 5: Comparison: t = 1.851 < 𝑡3 = 2.353 ➔ Accept H0
Conclusion: Accept H0 or reject Ha. It means Team A on average score less than or equal to Team B
In other words, there is not enough evidence to support the theory Team A on average score higher
than Team B at the 95% level of confidence.
Example 2: Testing the difference between the mean of the amount of money between Unit Price
variable and Unit Cost variable based on a data set of ABC company at 95% level of confidence (alpha
= 0.05).
Assuming:
H0 There is no difference in the average of the amount of money
between the unit price and unit cost.
➔ 𝜇1 − 𝜇2 = 0
Ha There is difference in the average of the amount of money
between the unit price and unit cost.
➔ 𝜇1 − 𝜇2 ≠ 0
Based on the Paired Sample Statistics table, the results show that the sample size of “Unit Price” and
“Unit Cost” variables are also 50 observations.
39
(Source: attached spv file, 2021)
Paired Differences
Pair 1 UnitPrice - UnitCost 8.02460 11.39851 1.61199 4.78518 11.26402 4.978 49 .000
Conclusion: Reject H0 or accept Ha. It means there is difference in the average of the amount of money
between the unit price and unit cost.
40
4. Measuring the association between two variables (from the dataset)
4.1 Correlation analysis
Correlation is a bivariate analysis that determines the strength of the relationship between two
variables as well as the direction of the relationship.
Moreover, the value of the correlation coefficient always ranges between +1 and -1 in terms of the
strength of the association. In this scenario, a value of ± 1 revealed that these two variables are
perfectly associated. The relationship between the two variables will become weaker as the
correlation coefficient value approaches zero. Particularly, the sign of the correlation coefficient
reflects the direction of the relationship, for example, a + sign indicates a positive relationship, a –
sign indicates a negative relationship, and a “0” indicates there is no relationship at all between the
variables (statisticssolutions.com, 2021).
41
In the statistical task of ABC company, the author determined and analyzed the correlation
relationship between the variable "Unit price" and the variables "Unit cost", "Unit sold", "Profit" at
5% significance level (Alpha = 0.05) by using SPSS statistical software.
Correlations
UnitPrice UnitCost UnitSold Profit
N 50 50 50 50
N 50 50 50 50
N 50 50 50 50
• The sig. Value of “Unit price” variable and “Unit sold” variable = 0.003 < alpha = 0.05 ➔ Having
a statistically significant linear relationship.
• The Pearson Correlation = -0.410 ➔ The direction of the relationship is negative (i.e., Unit
price and Unit cost variables are negatively correlated).
42
• The sig. Value of “Unit price” variable and “Profit” variable = 0.503 > alpha = 0.05 ➔ There is
no correlation between these two variables.
Regression analysis is a set of statistical methods for estimating relationships between one or more
independent variables and a dependent variable (typically indicated by Y). In addition, it can be used
to determine the strength of the relationship between variables and to forecast how they will interact
in the future. Normally, in statistics, there are two basic types of regression analysis include simple
linear regression and multiple linear regression (corporatefinanceinstitute.com, 2021).
Y = a + bX
Where:
• X: the independent variable
• Y: the dependent variable
• a: the constant
• b: the coefficient of X (Holmes, Illowsky and Dean, n.d.).
In statistical task, analyzing the simple linear regression model of an independent variable "Unit Cost"
and a dependent variable "Unit Price" from a data set of ABC company at 95% level of confidence by
using SPSS statistical software.
43
ANOVAb
Model Sum of Squares df Mean Square F Sig.
Total 39820.319 49
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients Collinearity Statistics
44
Where:
• Y: the dependent variable
• Xi: the independent variables
• a: the intercept.
• Bi: the coefficients
• U: the regression residual (error) (Holmes, Illowsky and Dean, n.d.).
Analyzing the multiple linear regression model of independent variables "Unit Sold", "Total Revenue",
"Total Cost" and a dependent variable "Profit" from a data set of ABC company at 95% level of
confidence by using SPSS statistical software.
45
Assuming: H0: 𝑅 2 = 0 ➔ The model does not exist
Ha: 𝑅 2 ≠ 0 ➔ The model does exist
Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate Durbin-Watson
ANOVAb
Total 2620736.398 49
46
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients Collinearity Statistics
• Sig. value of t-test of Unit Sold = 0.731 > α = 0.05, which means independent variable “Unit
Sold” does NOT influence on dependent variable “Profit”.
• Sig. value of t-test of Total Revenue = 0.403 > α = 0.05, which means independent variable
“Total Revenue” does NOT influence on dependent variable “Profit”.
• Sig. value of t-test of Total cost = 0.000 < α = 0.05, which means independent variable “Total
cost” influences on dependent variable “Profit”.
47
• If all factors unchanged, when Total Revenue rises by 1 unit, Profit will increase by “y = 5.701
+ 0.057*1 = 5.758”.
• If all factors unchanged, when Total Cost rises by 1 unit, Profit will increase by “y = 5.701 +
0.861*1 = 6.562”.
48
Based on the histogram with a normal curve above, mean = 1.47E * 10−16 comes to 0 and standard
deviation = 0.969, therefore it can be concluded that the residual has standard distribution.
Based on the scatter plot above, the finding showed that the scatters distribute randomly and gather
around 0 axes, therefore, it led to conclusion that the linear relationship between independent and
dependent variables is not infringed.
Descriptive, inferential, and confirmatory statistics are the three basic categories that all statistical
approaches belong to. Therefore, the contrasts between these three will be discussed in this section,
as well as how they affect the field of data analytics.
49
Descriptive statistics Exploratory statistics Confirmatory statistics
Descriptive statistics can also Exploratory analyses can also be An important factor of
report and present generally used to define future research confirmatory data analysis is
quantifying aspects such as the
consist of summary data tables, issues or investigations.
graphics, and text to explain Furthermore, it is rarely the extent to which any deviation
what the charts and tables are complete and definitive answer from the model that analysts
displaying to the question at hand, but have established could have
(datascientistinsights.com, rather the beginning. By using occurred by chance and when
they should begin doubting their
2021). exploratory analysis, data
analysts can look for clues and model (sisense.com, 2021).
trends that will help them
conclude (online.ndm.edu,
2018).
50
(mymarketresearchmethods.c model that can describe the data tools of confirmatory analysis
om, 2020). in the most concise manner (online.ndm.edu, 2018).
possible (online.ndm.edu, 2018).
III. Applying a range of statistical methods used in business planning for quality, inventory, and
capacity management
1. Measuring the variability in business processes or quality management
A measure of variability is a summary statistic that indicates the amount of distribution in a data
collection. Normally, in statistics, there are three typical measures of variability including range,
variance, and standard deviation (as discussed in previous sections). In particular, a low distribution
presents that the data points tend to be group tightly around the center. In contrast, a high dispersion
indicates that these points tend to fall further apart (Frost, 2021).
All production and measuring procedures in the business world are subject to fluctuation, even over
time. As a result, the significance of variation in quality management cannot be overstated. In
particular, the percentage of deviation is lower when the variance is limited, and conversely.
Therefore, if the level of deviation is maintained to a minimum, quality management can be easier
and more convenient, and the consequence would be highly advantageous. So, the probability
distributions will be examined in-depth in the next part to demonstrate the importance of variation in
quality management correlated with manufacturing distributions.
A probability distribution is a tool that measures the chance of obtaining a random variable's distinct
outcomes. To put it another way, the variable's values are determined by the underlying probability
distribution (Frost, 2021). There are many various classifications of probability distributions. Some of
them such as the Normal distribution, Poisson distribution, and Binomial distribution will be clarified
in detail below.
51
2.1 Normal distribution
The normal distribution, commonly called the Gaussian distribution, is symmetrical probability
distribution centered on the mean, indicating that data around the mean arise more frequently than
data far from it. In which, the normal distribution will display as a bell curve on a graph
(investopedia.com, 2021). Among that, the bell curve is symmetrical, in which half of the data fall to
the left of the mean and the rest half fall to the right. Furthermore, the highest peak on the normal
curve normally can be the mean, as well as the median and mode of the distribution
(statisticshowto.com, n.d.).
A normal distributionn
(Source: statisticshowto.com, n.d.)
Aside from that, the standard deviation regulates the normal distribution's spread. In which, a lower
standard deviation suggests that the data is firmly packed around the mean, resulting in a higher
normal curve. On an other hand, a larger standard deviation implies that the data is distributed out
around the mean, causing the normal distribution to be flatter and wilier (statisticshowto.com, n.d.).
52
(Source: Holmes, A., Illowsky, B. and Dean, S., n.d)
A standard normal probability distribution is a random variable that has a normal distribution with an
average of 0 and a standard deviation of 1. With the following formula, the letter z is widely used to
represent this specific normal random variable:
𝑋 − 𝜇
𝑧=
𝜎
(Source: Holmes, A., Illowsky, B. and Dean, S., n.d)
53
Example 1: The mean weight of a product is normally distributed with 73,5g and the standard
deviation is 10,5g. Let's calculate the probability of X ≤ 80g?
𝑋−𝜇 80 − 73.5
Step 1: 𝑧 = = = 0.6190
𝜎 10.5
Example 2: The mean of the distribution that client’s expenditure is $28, with an average standard
deviation of $7.
• Calculate the probability that a randomly selected client spends less than $ 33?
• Calculate the probability that a randomly selected client spends between $13 and $33?
• Calculate the probability that a randomly selected client spends more than $9?
• Calculate the $ amount such that 72% of all clients spending no more than this?
By using Excel.
54
2.2 Poisson distribution
A Poisson distribution is one of probability distribution types in statistics which can be utilized to
demonstrate how many times that an event is expected to occur during a fixed interval of time. To put
it another way, it is a count distribution. In addition, poisson distributions are frequently used to
comprehend independent events that happen at a steady rate during a specific time period
(statisticshowto.com, 2021).
There are some example graphs of Poisson distributions:
55
On the horizontal axis, Poisson distributions are only valid for integers. Furthermore, λ is the predicted
frequency of event occurrences (sometimes written as μ) (statisticshowto.com, 2021).
The binomial distribution is a probability distribution that expresses the chance that a value will
acquire one among two independent values based on a given set of parameters or hypotheses.
Furthermore, the binomial distribution's underlying assumptions are that each trial has just one
finding. In which, each trial has the same likelihood of success, and that each trial is mutually exclusive,
or independent of the others (investopedia.com, 2020).
Probability function
In general, the binomial distribution, as contrasted to a continuous distribution such as the normal
distribution, is a typical discrete distribution used in statistics. Because the binomial distribution only
includes two sides, 1 (for a success) and 0 (for a failure) for referring to the number of trials in the
data. Therefore, the binomial distribution shows the probability for x successes in n trials, and given a
success probability p for each trial (investopedia.com, 2020).
56
Besides that, binomial distributions must also response the following three conditions:
• The number of observations or trials is fixed.
• Each observation or trial is distinct from the others. To put it another way, none of trials have
any bearing on the likelihood of the next trial.
• From one trial to the next, the likelihood of success (tails, heads, fail, or pass) is similar
(statisticshowto.com, 2021).
2.4 Inference
Statistical inference is the process of inferring attributes of a fundamental probability distribution
through data analysis. Furthermore, inferential statistical analysis indicates population features, such
as through hypothesis testing and generating estimates. Therefore, it is concluded that the observed
data set is thought to be drawn from a wider population (Upton, G. and Cook, I. 2008).
Apart from that, inferential statistics and descriptive statistics can be compared. Descriptive statistics
is only interested in the properties of the database, and it does not infer that the data was obtained
from a wider population. Besides that, in the context of machine learning, the phrase inference is
mostly used to describe "make an estimation by analysing an already classification classifier." In this
framework inferring concept, properties are stood for as training or learning (rather than inference)
and using a model for prediction is stood for as inference (instead of forecasting)
(courses.lumenlearning.com, n.d.).
57
For example, consider a corporation that is considering expanding into a new market. If this company
has to make $500,000 in revenue to break even, and their probability distribution indicates that there
is a 10% chance that revenues would be less than $500,000, the corporation can estimate the risk level
will face if it pursues that new business market (smallbusiness.chron.com, n.d.).
IV. Using appropriate charts and tables to communicate findings of given variables
1. Analysing data and interpreting results by using frequency distribution tables, graphs,
and charts
Normally, in statistics, the data analysis findings are represented under types of tables and charts that
will be mentioned below:
❖ Frequency table
A frequency table is a table that lists items and presents the number of times the items arise.
Additionally, frequency tables can be utilized to describe the number of times of a specific type of
element within a database. Moreover, frequency tables often known as frequency distribution tables,
are one of the most fundamental methods for illustrating descriptive statistics.
For example:
58
Frequency table of “Unit Price” variable
(Source: attached spv file, 2021)
Base on the frequency table above, analysts can easily know how often the price levels belonging to
the "Unit Price" variable appear in the data set.
❖ Pie chart
A pie chart is also known as a circle chart and is a circular statistical visual that is separated into sectors
or segments to demonstrate numerical problems (byjus.com, 2021). Each sector represents a
proportionate percentage of the overall picture. In addition, the best method for determining the
composition of anything is to use a pie chart. Pie charts frequently replace other graphs such as bar
graphs, line plots, histograms, and so on in some cases (byjus.com, 2021).
Eraser
10%
Pencil
24%
Brush set
6%
Desk
Note Sticker Ruler
6%
12% 2%
59
With data of "Item" variable from ABC company database, looking at the below pie chart, it is very
easy to realize how many items the company is currently distributing, and the chart also shows the
percentage of each item account for.
❖ Bar chart
A bar chart is a visual representation of category data (continuous data can be made categorical by
auto-binning). Furthermore, the bar chart illustrates data through a series of bars, each reflects a
particular classification. In which, vertical columns, horizontal bars, and comparative bars (several bars
to indicate a comparison between values), and stacked bars are all options for bar charts. The height
or length of each bar corresponds to a certain collection (investopedia.com, 2021).
Unit Price
14 13
12
12
10
Frequency
8 7
6
6 5
4 3 3
2 1
0
1.29 1.99 2.99 4.99 8.99 15.99 19.99 125
Unit Price
60
In this method, we basically generate a histogram and then attach a normal curve to it, whose area
under the curve is equal to the histogram's area. To evaluate if data is normally distributed, a
histogram and a normal curve can be used together.
For example
❖ Scatter plots
In terms of a scatter plot, this is a chart type that is typically used to illustrate and visually present the
relationship between variables. The variables' values are indicated by dots. In addition, the placement
of the dots on the vertical and horizontal axis will show the value of the respective data point.
Therefore, scatter plots employ Cartesian coordinates to represent the values of the variables in a
data set. Moreover, scatter plots are also known as scattergrams, scatter graphs, or scatter charts
(corporatefinanceinstitute.com, n.d.).
61
(Source: attached spv file, 2021)
Based on the scatter plot in the section of Multiple linear regression analysis, the finding showed that
the scatters distribute randomly and gather around 0 axes, therefore, it led to conclusion that the
linear relationship between independent and dependent variables is not infringed.
2. The strengths and weaknesses of using different types of charts and tables
62
• The circle's size can be made it is difficult to determine the
proportional to the quantity it exact value expressed.
represents. • A pie chart will be less effective if
• Easy to create and comprehend it presented too many pieces of
because of their extensive use in data.
business and descriptive • Adding data labels and numbers
statistics. to the pie chart may not
• Permit a visual check of the be helpful because it will become
reasonableness or accuracy of the tangled and difficult to read
calculation (bizfluent.com, 2018). (bizfluent.com, 2018).
• Present relative numbers or • Fail to express main assumptions,
proportions of many categories in causes, impacts, and patterns.
the data set. • Additional explanation is
• Summarize a significant quantity frequently required.
of information from the data set • Can be easily manipulated to
in a visual, easy-to-understand bring wrong perceptions
Bar chart
format. (geographypoint.com, 2021).
63
• Can graph large data sets easily
with histograms
• Excellent at displaying financial • These graphs are unable to
data. determine the exact degree of
• Scatterplots are invaluable and correlation.
use with almost any continuous • It is not a quantitative measure of
Scatter plots
scale data. the variables' relationship. It
• It is simple to comprehend and is only a numerical representation
interpreted of a numerical change
(preservearticles.com, n.d.). (preservearticles.com, n.d.).
For example, for descriptive statistics analysis, frequency tables and bar charts would be appropriate
to present the results, as they highlight the trends and frequencies of the data in a clear, simple, and
understandable way.
Besides that, for the analysis of the relationship between variables in a typical data set such as
correlation and regression analysis, scatter plots may be the most suitable tool to present the results.
Because scatter plots consist of multiple data points plotted on two axes. Each variable is depicted in
a scatter plot will have multiple observations, mainly used for the analysis of correlation and
distribution, and utilized to determine whether the relationship between two different variables is
correlated. At the same time, scatter plots are often used to create predictive models and statistical
decision-making in enterprises. From here, it will be easier for analysts to understand the meaning of
the results and can draw conclusions more accurately.
64
Also for the probability distribution analysis, I consider the histogram with a normal curve to be a
suitable tool for illustrating the results. Based on the histogram combined with the normal curve, the
analyst has accurate conclusions about the normal or abnormal distribution situation of the quality
indicator or the process that related to the data set. From there, they can make appropriate decisions
to improve and improve quality for the future.
V. Conclusion
To conclude, this study analyzed and evaluated raw business data by utilizing a variety of statistical
approaches, and presented the outcomes in the form of appropriate tables and charts. Additionally,
the distinctions in implementation between descriptive, exploratory, and confirmatory analysis of
business and economic data were analyzed before legitimate suggestions and assessments for
enhancing company planning through statistical approaches were made. Thereby, I can better
understand the nature of the statistics for management and perform data analysis by using the above
statistical methods through SPSS software. From those findings, I have enough pieces of evidence and
data to support managers of ABC company make business decisions and new strategies more
effectively.
65
VI. References
statistics.laerd.com, 2021. Mean, Mode and Median - Measures of Central Tendency - When to use
with Different Types of Variable and Skewed Distributions | Laerd Statistics. [online]
Statistics.laerd.com. Available at: https://statistics.laerd.com/statistical-guides/measures-central-
tendency-mean-mode-median.php [Accessed 20 June 2021].
byjus.com, 2021. Central Tendency Definition | Measures of Central Tendency & Examples. [online]
BYJUS. Available at: https://byjus.com/maths/central-tendency/ [Accessed 20 June 2021].
statisticshowto.com, 2021. Inferential Statistics: Definition, Uses. [online] Statistics How To. Available
at: https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/inferential-
statistics/ [Accessed 21 June 2021].
Surbhi, S., 2017. Difference Between Population and Sample (with Comparison Chart) - Key
Differences. [online] Key Differences. Available at: https://keydifferences.com/difference-between-
population-and-sample.html [Accessed 21 June 2021].
66
statisticssolutions.com, 2021. Independent Sample T-Test - Statistics Solutions. [online] Statistics
Solutions. Available at: https://www.statisticssolutions.com/independent-sample-t-test/ [Accessed 2
July 2021].
statisticssolutions.com, 2021. Correlation (Pearson, Kendall, Spearman) - Statistics Solutions. [online]
Statistics Solutions. Available at: https://www.statisticssolutions.com/free-resources/directory-of-
statistical-analyses/correlation-pearson-kendall-spearman/ [Accessed 6 July 2021].
datascientistinsights.com, 2021. Six Types Of Analyses Every Data Scientist Should Know - Data
Scientist Insights. [online] Data Scientist Insights. Available at:
https://datascientistinsights.com/2013/01/29/six-types-of-analyses-every-data-scientist-should-
know/ [Accessed 7 July 2021].
online.ndm.edu, 2018. Exploratory Analysis vs. Confirmatory Analysis. [online] NDMU Online.
Available at: https://online.ndm.edu/news/analytics/exploratory-analysis-vs-confirmatory-analysis/
[Accessed 7 July 2021].
sisense.com, 2021. Exploratory and Confirmatory Analysis: What’s the Difference? l Sisense. [online]
Sisense. Available at: https://www.sisense.com/blog/exploratory-confirmatory-analysis-
whatsdifference/ [Accessed 7 June 2021].
Frost, J., 2021. Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
- Statistics By Jim. [online] Statistics By Jim. Available at: https://statisticsbyjim.com/basics/variability-
range-interquartile-variance-standard-deviation/ [Accessed 7 July 2021].
67
Frost, J., 2021. Understanding Probability Distributions - Statistics By Jim. [online] Statistics By Jim.
Available at: https://statisticsbyjim.com/basics/probability-distributions/ [Accessed 7 July 2021].
investopedia.com, 2021. Normal Distribution. [online] Investopedia. Available at:
https://www.investopedia.com/terms/n/normaldistribution.asp [Accessed 7 July 2021].
statisticshowto.com, 2021. Poisson Distribution / Poisson Curve: Simple Definition. [online] Statistics
How To. Available at: https://www.statisticshowto.com/poisson-distribution/ [Accessed 7 July 2021].
investopedia.com, 2020. How Binomial Distribution Works. [online] Investopedia. Available at:
https://www.investopedia.com/terms/b/binomialdistribution.asp [Accessed 7 July 2021].
statisticshowto.com, 2021. Binomial Distribution: Formula, What it is, How to use it. [online] Statistics
How To. Available at: https://www.statisticshowto.com/probability-and-statistics/binomial-
theorem/binomial-distribution-formula/ [Accessed 7 July 2021].
byjus.com, 2021. Pie Chart (Definition, Formula, Examples) | Making a Pie Chart. [online] BYJUS.
Available at: https://byjus.com/maths/pie-chart/ [Accessed 8 July 2021].
68
investopedia.com, 2021. Bar Graph Definition and Examples. [online] Investopedia. Available at:
https://www.investopedia.com/terms/b/bar-graph.asp [Accessed 8 July 2021].
preservearticles.com, n.d. What are the Merits and Demerits of Scatter Diagram?. [online]
PreserveArticles.com: Preserving Your Articles for Eternity. Available at:
https://www.preservearticles.com/articles/what-are-the-merits-and-demerits-of-scatter-
diagram/7724 [Accessed 8 July 2021].
Holmes, A., Illowsky, B. and Dean, S., n.d. Introductory business statistics. Houston, Texas: OpenStax.
69