0% found this document useful (0 votes)
89 views69 pages

570 - Assignment 02 - Nguyen Ngoc Van Anh

This document is the front sheet for Assignment 02 which analyzes raw business data using statistical methods. It includes information such as the student name and ID, unit number, submission dates, and assessor details. The front sheet also lists the grading criteria and contains sections for student and assessor signatures and feedback. Finally, it provides a table of contents outlining the report structure in 5 sections that will analyze and evaluate qualitative and quantitative data, apply statistical methods to business planning, communicate findings using charts and tables, and draw conclusions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views69 pages

570 - Assignment 02 - Nguyen Ngoc Van Anh

This document is the front sheet for Assignment 02 which analyzes raw business data using statistical methods. It includes information such as the student name and ID, unit number, submission dates, and assessor details. The front sheet also lists the grading criteria and contains sections for student and assessor signatures and feedback. Finally, it provides a table of contents outlining the report structure in 5 sections that will analyze and evaluate qualitative and quantitative data, apply statistical methods to business planning, communicate findings using charts and tables, and draw conclusions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

ASSIGNMENT 02 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Business

Unit number and title Unit 31: Statistics for management

Submission date Date received (1st Submission)

Re-submission date Date received (2nd Submission)

Student Name NGUYEN NGOC VAN ANH Student ID GBS190587

Class No. GBS0901 Assessor Name VO MINH VINH

Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism.
I understand that making a false declaration is a form of malpractice.
Student Signature

Grading grid

P3 P4 P5 M2 M3 M4 D1 D2 D3

1
Description of activity undertaken

Assessment & Grading criteria

How the activity meets the requirements of the criteria

Student Signature Date:

Assessor Signature Date:

Assessor name:

2
Summative Feedbacks Resubmission Feedbacks

Grade: Assessor Signature: Date:


Internal Verifier’s Comments:

Signature & Date:

3
Table of Contents
I. Introduction ..................................................................................................................................... 6
II. Analysing and evaluating qualitative and quantitative raw business data from a range of
examples using appropriate statistical methods ................................................................................... 7
1. Differences between quantitative and qualitative raw data analysis ....................................7
2. Descriptive statistics .......................................................................................................... 10
2.1 Measures of Central Tendency ........................................................................................... 10
2.2 Measures of Variability ....................................................................................................... 15
3. Inferential statistics ................................................................................................................... 19
3.1 The differences between Population and Sample .............................................................. 19
3.2 One sample T-test: Estimation and Hypotheses testing ..................................................... 21
3.2.1 The mean Estimation ....................................................................................................... 21
3.2.2 Hypotheses testing of a sample....................................................................................... 23
Two-tailed testing ................................................................................................................... 25
One-tailed testing ................................................................................................................... 26
3.3 Two samples T-test and Independent Sample T-test: Estimation and Hypotheses
testing..... ....................................................................................................................................... 29
3.3.1 Independent Samples ...................................................................................................... 29
Estimation ............................................................................................................................... 29
Hypothesis testing (2-tailed) ................................................................................................... 31
3.3.2 Dependent Samples ......................................................................................................... 37
Estimation ............................................................................................................................... 37
Hypothesis testing................................................................................................................... 37
4. Measuring the association between two variables (from the dataset) .................................. 41
4.1 Correlation analysis ............................................................................................................. 41
4.2 Regression analysis and simple forecasting ........................................................................ 43
Simple linear regression model .............................................................................................. 43
Multiple linear regression model ............................................................................................ 44
5. The differences in application among descriptive, exploratory and confirmatory analysis
techniques in general........................................................................................................................ 49
III. Applying a range of statistical methods used in business planning for quality, inventory and
capacity management ........................................................................................................................... 51
1. Measuring the variability in business processes or quality management............................... 51

4
2. Measuring the probability by using probability distributions to business operations and
processes ........................................................................................................................................... 51
2.1 Normal distribution ............................................................................................................. 52
2.2 Poisson distribution ............................................................................................................. 55
2.3 Binomial distribution ........................................................................................................... 56
2.4 Inference.............................................................................................................................. 57
3. Evaluations and recommendations for improving business planning through statistical
methods above. ................................................................................................................................ 57
IV. Using appropriate charts and tables to communicate findings of given variables .................... 58
1. Analysing data and interpreting results by using frequency distribution tables, graphs, and
charts ................................................................................................................................................. 58
2. The strengths and weaknesses of using different types of charts and tables ......................... 62
3. The most effective way of communicating the results of the analysis ................................... 64
V. Conclusion ..................................................................................................................................... 65
VI. References ..................................................................................................................................... 66

5
I. Introduction
Generally, statistics plays a significant role in every enterprise or organization. It gives all the
economic information, financial or marketing data that every corporation needs to succeed in the
marketplace. This assignment will be discussed about analysing and evaluating raw business data by
utilizing some common of statistical methods. Besides that, after analysing, these methods are also
applied in business planning of a specific company. In addition, all findings and outcomes will be
presented in appropriate graphs and tables.

ABC company is a unit specializing in distributing good quality stationery in Ho Chi Minh. This is a
distribution agent of many famous brands that are now trusted by many customers, besides the
company provides a full range of commonly used items such as pens, brushes, binders, all kinds of
printing paper and photocopying paper, etc. During its operation, ABC company has a relatively
stable revenue for each product, but recently, there are several new competitors in the market lead
to the number of goods on the market is diversified day by day. Therefore, this causes certain
difficulties for ABC company's business, and this is also the reason for this research.

For that reason, as a researcher of analysis, the author will conduct statistics and analysis of ABC
company's sales to find more deeply about business performance from 2019 to 2020 of this
company. Besides, the results of this research will be the premise for the company's upcoming
business plan to improve product quality as well as improve the effectiveness of marketing activities.

This research has profound implications for research analysis, it provides the researcher with an
accurate and comprehensive view of the numbers, data, business context of a company. At the same
time, when conducting the analysis, the data on goods, revenue, and profit will be accurate and
complete statistics. Thereby, the statistical results can support the company's managers to make
new decisions and strategies more effectively.

Secondary research is the methodology of this statistic. In particular, the sales data of company ABC
is obtained from the published source via the website https://data.world/ . Moreover, the data table
includes 9 variables (Order date, Region, Item, Units sold, Unit cost, Unit price, Total revenue, Total
Cost, and Profit), with 50 observations.

6
II. Analysing and evaluating qualitative and quantitative raw business data from a range of
examples using appropriate statistical methods

1. Differences between quantitative and qualitative raw data analysis


Raw data in statistics may normally come from either a population or a sample, apart from that, most
raw data is divided into two categories including qualitative or quantitative data. In this section, the
differences between these two types of data will be clarified.

Quantitative data
Quantitative data are always numbers and can also be count. Furthermore, this is the result of a
population's qualities being counted or measured, and this kind of data might be discrete or
continuous. Moreover, continuous data is further divided into interval data and ratio data. Normally,
researchers prefer to use quantitative data rather than qualitative (categorical) data because it is
easier to analyse mathematically (Holmes, Illowsky and Dean, n.d.).

Qualitative data
Qualitative data is the outcome of categorizing or describing a population's characteristics. Based on
that, categorical data is another name for this type of information, which is generally expressed by
words or letters, and it cannot be counted. Moreover, nominal data and ordinal data are the two
types of qualitative data. Nominal data is used to name or describe variables, whereas ordinal data is
used to scale them (Holmes, Illowsky and Dean, n.d.).

7
Table 1: Example of Quantitative data
(Source: Attached excel file, 2021)

Table 2: Example of qualitative data


(Source: Author, 2021)

In fact, each type of data analysis also has its own benefits and drawbacks. Therefore, the following
part will be examined these aspects of Quantitative and Qualitative data analysis.

8
Quantitative data analysis Qualitative data analysis
• Larger sample sizes provide for a • Because data is more on an individual
more comprehensive level and may go into further detail.
interpretation of the findings. Therefore, it is used to obtain a deeper
Besides, it allows analysts to knowledge of their thoughts and
establish more assumptions activities to produce or investigate a
about the target audience of the hypothesis in greater depth.
study.
• Encouraging discussion since it is
• Normally, data collected through conducted in a more open method rather
self-completion activities such as than following a predetermined set of
Advantages
online surveys are anonymous, questions. Apart from that, this provides
especially when dealing with context to the investigation instead of
sensitive issues. just data.

• Online and mobile surveys are • Flexibility, in which the interviewer can
faster and easier to conduct and investigate and ask the questions related
gather data, and based on that, to the research topic that they feel are
the analysts can receive findings important, or that they have not
in real-time. considered before throughout the
conversation, as well as vary the setting.

• Because of the privacy of the • If the researcher generalizes from the


respondents, no follow-up opinions of 5 responders out of 300 in
responses are provided after they their target audience or subscribers, for
have finished answering. If the example, the sample size can be an issue.
Disadvantages results are inconclusive, this is
particularly true for the validity of • Bias in sample selection, which means
the findings. that the individual who is chosen to
participate in qualitative research
might all have the same opinion about

9
• Because the analysts are limited the research topic, rather than a
by the survey's predetermined collection of people with differing
responses, so they are unable to viewpoints. This is more valuable,
delve deeper into the behaviours, especially if they are having a debate
opinions, and causes, as they about different viewpoints throughout
conduct qualitative research. This focus groups.
is especially true when it comes to
self-completion questionnaires
(online).

(Source: anparresearchltd.com, 2020)

2. Descriptive statistics
2.1 Measures of Central Tendency
A single value that aims to describe a main feature of a collection the centre position within that data
set is referred to as a measure of central tendency. The mean, mode, and median are all generic
measures of central tendency (Holmes, Illowsky and Dean, n.d.).

Mean
A most common and well-known measure of central tendency is the mean or average. It can be
used with both continuous and discrete data, however continuous data is the most common
(statistics.laerd.com, 2021). In addition, this value is calculated by dividing the total amount of values
in the data set by the quantity of values in the data set, according to the following basical formula:

𝑥1 +𝑥2 +....+𝑥𝑛
𝑥̅ =
𝑛
(Source: statistics.laerd.com, 2021)

Furthermore, one of the most essential properties of the mean is that it calculates every value in the
data set. For example, to calculate the mean value from the database of the variable "Total Revenue"
in the attached excel file, the author used the "Average" function to give the result.

10
(Source: Attached excel file, 2021)

Therefore, the mean value of the variable "Total Revenue" after a calculation is $504.64. It means
that the average total revenue earned by ABC company based on the database is $504.64. And this
value is represented on the following histogram:

Histogram of Total Revenue


(Source: Attached excel file, 2021)

11
Mode
The mode refers to the value that occurs most frequently in the database. To put it in another way, it
is the most typical number in a group of data. In statistic, there is no formula mode in mathematics,
and it is typically used for categorical because it just considers the most frequently recurring elements
from the collection (statistics.laerd.com, 2021).

For example, with the data of a variable “Unit Price” from the attached excel file, the author has
used SPSS software to calculate the mode value. There are the mode value’s results of “Unit Price”
variable from SPSS.

Frequency of Unit Price variable


(Source: attached spv file, 2021)
From the above two tables, the finding presented that the mode value of the variable "Unit Price" is
19.99. This means that the $19.99 price is the most appeared in the data set of Unit Price variable.
This value is also presented on the following bar chart:

12
Mode

Bar chart of Unit Price


(Source: Attached excel file, 2021)

Median
The median is the middle value of a list number, in which the database is sorted in descending or
ascending order. Additionally, this value might also be used for interval or ordinal data. When there
are an even number of items in a dataset, the median value can be calculated by taking the average
of the centre two values (byjus.com, 2021). In term of theory, the median value is calculated by the
following formula:

𝑛+1
• Median = 𝑡ℎ (in case n is an odd number) (Holmes, Illowsky and Dean, n.d.).
2
𝑛 𝑛
( ) 𝑡ℎ +( +1) 𝑡ℎ
2 2
• Median = (in case n is an even number) (Holmes, Illowsky and
2
Dean, n.d.).

13
Example 1: The following dataset has an odd number of observations that are organized in
descending order: 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6.

11 + 1
In this case: n = 11 ➔ Median value = 𝑡ℎ = 6th = 13
2

Example 2: The following dataset has an even number of observations that are organized in
descending order: 23, 21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 4.

12 12
( ) 𝑡ℎ +( +1) 𝑡ℎ 6𝑡ℎ + 7𝑡ℎ 13 + 12
2 2
In this case: n = 12 ➔ Median value = = = = 12.5
2 2 2

Besides that, in this statistical task of ABC company, the table below also presented the findings of the
median values of the two variables “Unit Sold” and “Unit Price” from the database in the attached
excel file by using SPSS.

(Source: attached spv file, 2021)

Based on this table, the median value of “Unit Sold” variable is 50.50 and the median value of “Unit
Price” variable is 8.99.

14
2.2 Measures of Variability
A summary statistic that indicates the amount of distribution in a database is known as a measure of
variability. In statistics, the range, variance, and standard deviation are the three most used
measures of variability (Holmes, Illowsky and Dean, n.d.). Apart from that, these three aspects will be
examined in detail in the following paragraphs.

Range
The range value is usually the easiest and simplest measure of variability to compute and comprehend.
Besides, the range of a database in statistics is the distance between the highest and lowest value in
the dataset. In other words, the range value is calculated based on a formula in which Maximum
value minus Minimum value (onlinestatbook.com, n.d.).

Example 1: Considering the list of numbers including 2, 4, 5, 7, 7, 12, 13, 14.

In this case: Maximum value = 14 and Minimum value = 2 ➔ Range value = 14 – 2 =12

Example 2: In the statistical task of ABC company, the range value of the “Profit” variable of the
database in the attached excel file is considered by using SPSS. And the following table had shown this
result.

(Source: attached spv file, 2021)


From this above table, the results were clearly showed that for 50 observations, the range value of
the "Profit" variable is 1028.23, this means the distance or the range of variation between the
minimum and maximum values of this variable is 1028.23.

15
Variance
In statistics, variance presents a "description" of the distribution in terms of how observations
gather or split from each other. Apart from that, the variance is defined as the average squared
difference of the values from the average by using the mean as the measure of the centre of the
distribution. Moreover, to calculate the variance, the distance between the observation and the
mean is squared and then combined (onlinestatbook.com, n.d.). In addition, variance is an essential
computation when calculating a distribution's standard deviation. There are two formula to calculate
the variance of Sample and Population

Sample variance Population variance

2
∑(𝑋 − 𝑋̅)2 2
∑(𝑋𝑖 − 𝜇)2
𝑠 = 𝜎 =
𝑁 −1 𝑁

𝑠 2 : The sample variance 𝜎 2 : The population variance


𝑋̅ : The sample mean 𝜇: The population mean

(Source: onlinestatbook.com, n.d.)

Example 1: Considering the following sample dataset including 8, 9, 7, 6, 5, to calculate the variance,
the author had to use a sample variance formular.

In this case: N = 5 and 𝑋̅ = (8 + 9 + 7 + 6 + 5) : 5 = 7

(8 − 7)2 + (9 − 7)2 + (7 − 7 )2 + ( 6 − 7 )2 + (5 − 7)2


➔ 𝑠2 = = 2.5
5−1

Example 2: In the statistical task of ABC company, the variance value of the “Unit Cost” variable of the
database in the attached excel file is considered by using SPSS and Excel. Apart from that, the two
following tables had shown this finding.

16
(Source: Attached excel file, 2021)

(Source: attached spv file, 2021)

From results in the above table, the variance of “Unit Cost” variable is 295.395. Besides, a large
variance indicates that the values in the data set are far from the mean value and highly variable, while
a small variance indicates the opposite.

Standard Deviation
With continuous distributions, the standard deviation is an essential measure of variability. In
statistics, the standard deviation is considered as the common voice of fluctuations. In order words,
it enables a better understanding of how much observations deviate from the average as well as
how much general variability a distribution contains. Additionally, particular observations can also
be interpreted in terms of their relationship to the mean by utilizing standard deviations
(scalestatistics.com, 2021).

Standard deviation is calculated by taking the square root of the variance in a distribution
(onlinestatbook.com, n.d.).

17
Sample standard deviation Population standard deviation

𝑠 = √𝑠 2 𝜎 = √𝜎 2

𝑠 2 : The sample variance 𝜎 2 : The population variance


𝑆: The sample standard deviation 𝜎 : The population standard deviation

(Source: scalestatistics.com, 2021)

Example 1: Considering the following sample dataset including 8, 9, 7, 6, 5, to calculate the standard
deviation, the author had to use a sample standard deviation formula.

In this case: N = 5 and 𝑋̅ = (8 + 9 + 7 + 6 + 5) : 5 = 7

(8 − 7)2 + (9 − 7)2 + (7 − 7 )2 + ( 6 − 7 )2 + (5 − 7)2


➔ 𝑠2 = = 2.5
5−1

➔ 𝑠 = √𝑠 2 = √2.5 = 1.58

Example 2: With the data of a variable “Unit Cost” from the attached excel file, the author have used
SPSS software to calculate the sample standard deviation.

(Source: attached spv file, 2021)


Based on the above table, the standard deviation value of “Unit Cost” variable is 17.18. Apart from
that, if a value of the dataset has a far distance from the mean, it is considered that the more spread
out the data is, the higher its standard deviation is.

18
3. Inferential statistics
In general, inferential statistics uses a random sample of the dataset taken from a population to
describe and give conclusions about the population. In other words, inferential statistics enables
researchers to make conclusions (“inferences”) from the dataset. Moreover, analysts can use
inferential statistics to develop generalizations about a population based on data from samples
(statisticshowto.com, 2021)

There are two important aspects of inferential statistics including Estimation and Hypothesis testing
(statisticshowto.com, 2021). Among that, Estimation is divided into Point Estimate and Interval
Estimate. Besides, Hypothesis testing in inferential statistic consist of One-tail test and Two-tail test
(Holmes, Illowsky and Dean, n.d.).

3.1 The differences between Population and Sample

Every researcher should be aware of the difference between population and sample because these
are important concepts in research. The distinction between a specific population and a sample is
simple to grasp.

19
Firstly, the term of population is defined as the sum of all the elements under investigation that share
one or more common properties. In other words, a population is made up of all members of a specific
group, as well as all conceivable outcomes or measurements. Furthermore, a population does not
have to be made up of only people, and it can also contain animals, events, items, structures, and so
on. The specific population will be determined by the study's scope (Surbhi, 2017).

Secondly, the term of simple is understood as a subset of the population chosen at random from
research participants. The sample should be chosen so that it accurately represents the population in
all of its properties and is devoid of bias, resulting in a small cross-section of the population, as the
statistical inferences of a sample are used to make population-wide generalizations (Surbhi, 2017).
Apart from that, while making statistical testing in a large scope, the researcher commonly used
samples rather than populations because it is hard to collect enough data and information from the
whole.

(Source: Surbhi, 2017)

20
3.2 One sample T-test: Estimation and Hypotheses testing

In statistics, the one-sample t-test is a statistical hypothesis test used to examine whether an unknown
population or sample means is different from a specific value (stattrek.com, 2021). In this section, the
mean estimation and hypotheses testing of a single sample will be clarified in detail.

3.2.1 The mean Estimation

In statistics, an estimate is a value calculated from a sample that is expected to represent the value to
be determined in the population. In which, estimation has two types including point estimate and
interval estimate (Holmes, Illowsky and Dean, n.d.). In this section, the interval estimation will be
considered.

Interval estimation is a way of using sample data to calculate or predict a range of possible values.
Additionally, a confidence interval provides an estimated range of values which is likely to include an
unknown population parameter, and the estimated range being calculated from a particular sample
database (Holmes, Illowsky and Dean, n.d.).

𝜇: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑋̅ : Sample mean
α : Significant level
As 𝝈 known 𝑧𝛼/2 : The z-value of standard
normal distribution
𝜎: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 Standard deviation
n: Sample size

𝜇: 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑋̅ : Sample mean
α : Significant level
𝛼/2
As 𝝈 unknown 𝑡(𝑛−1) : The value of the student (t)
probability with n – 1 degree of
freedom
S: Sample standard deviation
n: Sample size

(Source: Holmes, Illowsky and Dean, n.d.)

21
Example 1: Finding the income of population mean with 1% significance if a sample of 30 people has
a mean of $750, and the sample standard deviation is $65. Let's calculate the interval estimate of the
population mean.

In this case: 𝛼 = 0.01, 𝑛 = 30, 𝑥̅ = 750, 𝑠 = 65

𝛼/2 0.01/2 0.005


Looking at Critical value of t table ➔ 𝑡(𝑛−1) = 𝑡(30−1) = 𝑡29 = 2.756

𝛼/2 𝑠 65
Applying in the formula: 𝜇 = 𝑥̅ ± 𝑡(𝑛−1) ∗ = 750 ± 2.756 ∗ = 750 ± 32.71
√𝑛 √30

= (717.29 ; 782.71) ➔ 717.29 ≤ 𝜇 ≤ 782.71

Conclusion: The income of population mean with 1% significance is between the range of $717.29 and
$782.1.

Example 2: In the statistical task of ABC company, the author estimates the population mean of the
"Profit" variable by using excel based on the database in the attached excel file at the 95% level of
confidence (significant value = 0.05).

(Source: Attached excel file, 2021)

The table above showed findings: with the confidence level of 95%, the mean sample is 263.70 and
the Error of the interval estimate of population mean is 34.52. Based on that, it can be calculated the
Lower limit value is 229.18 and the Upper limit value is 298.21. Therefore, the conclusion of this result

22
is that the population mean of the “Profit” variable is estimated between the range of 229.18 and
298.21 at 95% confidence.

3.2.2 Hypotheses testing of a sample

A statistician will make judgments about their findings based on a process known as " hypothesis
testing" in statistics. A hypothesis test entails gathering and analyzing data from a sample. After that,
the statistician gets to decide whether there is enough evidence to reject the null hypothesis based
on data analyses. Additionally, the actual test starts with two hypotheses being considered, which are
the null hypothesis (H0) and the alternative hypothesis (Ha). These two hypotheses include the
opposite points of view (Holmes, Illowsky and Dean, n.d.).

• H0: The null hypothesis which is understood as no distinction between a sample mean or
proportion and a population mean or proportion, according to this assertion. In other words,
the difference is zero (Holmes, Illowsky and Dean, n.d.).

• Ha: The alternative hypothesis is defined as a claim about the population that is contradictory
to H0 and what analysts assume when they reject the H0 (Holmes, Illowsky and Dean, n.d.).

The following table shows the different hypotheses in the relevant pairs. In terms of theory, especially,
the H0 is always the one that has the equal (=) sign.

(Source: Holmes, Illowsky and Dean, n.d.)

❖ Steps in Hypothesis testing

Step 1: Specifying the Null Hypothesis (H0) and the Alternative Hypothesis (Ha)

Step 2: Setting the Significance Level (𝛼)

Step 3: Collecting the sample data and compute the value of the test statistic.

23
Step 4: Using the level of significance to determine the critical value and the rejection rule.

Step 5: Using the value of the test statistic and the rejection rule to determine whether to
reject Ho or not (nedarc.org, 2019).

Furthermore, a hypothesis test about the value of a population mean 𝜇 must typically take one of the
three forms listed below.

One-tailed (Left-tail) One-tailed (Right-tail) Two-tailed

H0: 𝜇 ≥ 𝜇0 H0: 𝜇 ≤ 𝜇0 H0: 𝜇 = 𝜇0


Ha: 𝜇 < 𝜇0 Ha: 𝜇 > 𝜇0 Ha: 𝜇 ≠ 𝜇0

(Source: Holmes, Illowsky and Dean, n.d.)

For the hypotheses testing of a sample, because this case is not possible to determine the standard
deviation of the population of this sample (for some reason such as the population size being too
large leading to the inability to the ability to collect sufficient data), statisticians have to use the t-
value rather than the z-value to test hypotheses H0 and Ha.

H0 Ha Value testing Rejecting H0

𝜇 = 𝜇𝑜 𝜇 ≠ 𝜇𝑜 𝛼/2
|𝑡| ≥ 𝑡𝑛−1
(𝑥̅ − 𝜇𝑜 )√𝑛
𝑡= 𝛼
𝜇 ≤ 𝜇𝑜 𝜇 > 𝜇𝑜 𝑠 𝑡 ≥ 𝑡𝑛−1
𝛼
𝜇 ≥ 𝜇𝑜 𝜇 < 𝜇𝑜 𝑡 ≤ 𝑡𝑛−1

Where
𝜇𝑜 : The value of a constant
𝑥̅ : The mean of the sample data
n : Sample size
s : Sample standard deviation
𝛼/2 𝛼
𝑡𝑛−1 𝑜𝑟 𝑡𝑛−1 : The value of student (t) distribution table with degree of freedom DF = n -1

24
Two-tailed testing
Example 1: The "Ao Dai" designer has actually assumed that: The mean height of adult females is
equal to 161 cm. In addition, the evidence proof is that a sample of 41 adult females had an average
height of 165 cm and the standard deviation is known to be 12 cm. Let's test the claim of the Ao dai
designer is right or wrong at the 5% level of significance.

Step 1: Assuming
The mean height of adult females is equal to 161 cm
H0
➔ 𝜇 = 𝜇𝑜 = 161 𝑐𝑚
The mean height of adult females is NOT equal to 161 cm
Ha
➔ 𝜇 ≠ 𝜇𝑜 = 161 𝑐𝑚

Step 2: In this case 𝜇𝑜 = 161 𝑐𝑚, n = 41, 𝑥̅ = 165 cm, s = 12

(𝑥̅ − 𝜇𝑜 )√𝑛 (165 − 161)√41


➔ 𝑡= = = 2.13
𝑠 12
𝛼/2 0.025
Step 3: Confidence = 100% - 5% = 95% ➔ 𝑡𝑛−1 = 𝑡40 = 2.021
0.025
Step 4: Comparison: t = 2.13 > 𝑡40 = 2.021 ➔ Reject H0
Step 5: Conclusion: Reject H0 (or Accept Ha), which means the mean height of adult females is
NOT equal to 161 cm.

Example 2: In the statistical task of ABC company, the author compares the mean of the "Unit cost"
variable of the database in the attached excel file with the test value is 20 at the 95% level of
confidence (significant value = 0.05).

Based on that: H0 is assumed that the mean of the "Unit cost" variable is equal to 20.
Ha is assumed that the mean of the “Unit cost” variable is NOT equal to 20.

25
(Source: attached spv file, 2021)

(Source: attached spv file, 2021)

The two above tables of results from SPSS showed:


• The Sig. value (2 – tailed) = 0.000 < the alpha = 0.05 = 5% ➔ The hypothesis H0: the mean of
the "Unit cost" variable is equal to 20 is rejected.

• Based on the mean = 9.076 in the One-Sample Statistics table, it is presented that the mean
value of the "Unit cost" variable is less than 20 (the t-test value of the mean of the “Unit cost”
variable t = -4.494 corresponding to a significance level of 0.000 < 0.05).

• Apart from that, it is concluded that hypothesis Ha is accepted, which means that the mean of
the “Unit cost” variable is NOT equal to 20.

One-tailed testing
Example 1: A sample of 30 people has an average weight of 45 Kg and a standard deviation of 5 Kg.
Let's test the claim whether the sample mean is less than or equal to 42 Kg at the 95% level of
confidence.

26
Step 1: Assuming
The sample mean of people is less than or equal to 42 kg
H0
➔ 𝜇 ≤ 𝜇𝑜 = 42 𝑘𝑔
The sample mean of people is greater than 42 kg
Ha
➔ 𝜇 > 𝜇𝑜 = 42 𝑘𝑔

Step 2: In this case 𝜇𝑜 = 42 𝑘𝑔, n = 30, 𝑥̅ = 45 kg, s = 5

(𝑥̅ − 𝜇𝑜 )√𝑛 (45 − 42)√30


➔ 𝑡= = = 3.28
𝑠 5
𝛼 0.05
Step 3: Confidence = 95% ➔ Significance = 100% - 95% = 5% ➔ 𝑡𝑛−1 = 𝑡29 = 1.699
0.05
Step 4: Comparison: t = 3.28 > 𝑡29 = 1.699 ➔ Reject H0

Step 5: Conclusion: Reject H0 (or Accept Ha), which means the sample mean of people is greater
than 42 kg.

Example 2: In the statistical task of ABC company, the author tested the mean of "Unit Sold" variable
of database in the attached excel file that is less than or equal to $51 at 95% level of confidence
(significant value = 0.05) and 99% level of confidence (significant value = 0.01) by using t-value and p-
value test of statistic.

Based on that: H0 is assumed that the mean of the "Unit sold" variable is less than or equal to $51.
Ha is assumed that the mean of the “Unit cost” variable is greater than $51.

27
(Source: Attached excel file, 2021)
After testing H0 by excel, the above table presented findings:
• t-multiple at 5% level of significant = -1.6766
• t-multiple at 1% level of significant = -2.4049
• t-value test = - 0.6454
• p-value test = 0.7392

Comparision:
By using t-value
• At 5% level of significant: |𝑡 − 𝑣𝑎𝑙𝑢𝑒| = |−0.6454| = 0.6454 < |𝑡 − 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒| =
|−1.6766| = 1.6766 ➔ Accept H0
• At 1% level of significant: |𝑡 − 𝑣𝑎𝑙𝑢𝑒| = |−0.6454| = 0.6454 < |𝑡 − 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒| =
|−2.4049| = 2.4049 ➔ Accept H0

By using p-value
• At 5% level of significant: p-value = 0.7392 > alpha = 0.05 ➔ Accept H0
• At 1% level of significant: p-va;ue = 0.7392 > alpha = 0.01 ➔ Accept H0

28
Conclusion: it is concluded that hypothesis H0 is accepted, and Ha is rejected, which means that the
mean of the “Unit sold” variable is less than or equal to $51.

3.3 Two samples T-test and Independent Sample T-test: Estimation and Hypotheses
testing.
3.3.1 Independent Samples
❖ Estimation
In order to estimate a difference of population means by using Independent Samples, the “t
distribution” are used because of population standard deviations are not available (assume that the
population distributions are Normal) (Holmes, Illowsky and Dean, n.d.). Besides that, in statistic, there
are two situations which differ in the formulas that need to be used:

• Situation 1: the Standard Deviations are SIMILAR.


• Situation 2: the Standard Deviations are DIFFERENT.

To check whether the two samples are similar or different, a "rough rule" will be used. Divide the
larger Standard Deviation by the smaller. If the result is < 1.5 then they are usually similar and vice
versa (Holmes, Illowsky and Dean, n.d.).

Example of Situation 1: Let’s estimate the difference between the mean weight of male and female
people at 95% level of confidence (𝜶 = 𝟎. 𝟎𝟓)

Male Female
Mean 𝑥̅1 = 60 kg 𝑥̅2 = 55 kg
Standard deviation 𝑠1 = 4 kg 𝑠2 = 3 kg
Sample size 𝑛1 = 19 𝑛2 = 21

In this case 𝑠1 ∶ 𝑠2 = 4 ∶ 3 = 1.33 < 1.5 ➔ the Standard Diveations are SIMILAR
Step 1: Degree of freedom (DF) when the standard deviations are similar:

DF = 𝑛1 + 𝑛2 − 2 = 19 + 21 – 2 = 38
𝛼/2 0.025
Step 2: Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡38 = 2.024

29
Step 3: Similar variance of the two independent samples:
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 (19 − 1)42 + (21 − 1)32
𝑆𝑝2 = = = 12.315
𝑛1 + 𝑛2 − 2 38
Step 4: Difference of Population Means:

𝛼/2 1 1
𝜇1 − 𝜇2 = 𝑥̅1 − 𝑥̅2 ± 𝑡𝐷𝐹 . √𝑆𝑝2 ( + )
𝑛1 𝑛2

1 1
➔𝜇1 − 𝜇2 = 60 − 55 ± 2.024 × √12.315 ( + ) = 5 ± 2.25 = (2.75 ; 7.25)
19 21

Step 5: Conclusion.

We are 95% confident that the difference in male and female average weight is 5 ± 2.25 = (2.75 ;

7.25). It means on average of weight, males are heavier than females in the range from 2.75 to 7.25
kg according to the above stastistical result.

Example of Situation 2: Let’s estimate the difference of the mean GPA score between female and
male students at 99% level of confidence (𝜶 = 𝟎. 𝟎𝟏)

Female Male
Mean 𝑥̅1 = 6.3 𝑥̅2 = 5.8
Standard deviation 𝑠1 = 3 𝑠2 = 1.8
Sample size 𝑛1 = 25 𝑛2 = 15

In this case 𝑠1 ∶ 𝑠2 = 3 ∶ 1.8 = 1.66 > 1.5 ➔ the Standard Diveations are DIFFERENT
Step 1: Degree of freedom (DF) when the standard deviations are different.
2
𝑠 2 𝑠22 32 1.82
2
( 1 + ) ( + )
𝑛1 𝑛2 25 15
DF = 2 2 = 2 2 = 37.99 ≈ 38
𝑠12 𝑠22 32 1.82
(𝑛 ) (𝑛 ) (25) ( 15 )
1 2
+ +
𝑛1 − 1 𝑛2 − 1 25 − 1 15 − 1

𝛼/2 0.005
Step 2: Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡38 = 2.712

30
Step 3: Difference of Population Means:

𝛼/2 𝑠12 𝑠12


𝜇1 − 𝜇2 = 𝑥̅1 − 𝑥̅2 ± 𝑡𝐷𝐹 . √ ( + )
𝑛1 𝑛2

32 1.8 2
➔ 𝜇1 − 𝜇2 = 6.3 − 5.8 ± 2.712 × √ ( + ) = 0.5 ± 2.06 = (-1.56 ; 2.56)
25 15

Step 4: Conclusion:

We are 99% confident that the difference in female and male students’ average GPA score is 0.5 ±

2.06 = (-1.56 ; 2.56). It means on average of GPA score, female students is higher than male
students in the range from -1.56 to 2.56 point according to the above stastistical result.

❖ Hypothesis testing (2-tailed)


In statistics, independent sample t-test is a technique that is used to examine the mean comparison
of two independent samples. In which, when taking two samples from the same population, then the
mean of the two samples may be similar. However, if these samples are gathered from two different
populations, the sample mean may differ. In this case, it is used to give conclusions about the means
of two populations, and used to draw whether or not they are similar (statisticssolutions.com, 2021).

(𝑥̅1 − 𝑥̅ 2 ) − (𝜇1 − 𝜇2 )
Formula of t-value test: 𝑡 =
𝑠 2
𝑠 2
√ 1 + 1
𝑛1 𝑛2

Example 1: Based on the following information, let’s test the hypothesis that there is a difference
between the mean weight of boys and girls from 10 to 15 year olds at the 5% significance level (𝜶 =
𝟎. 𝟎𝟓).
Boys Girls
Mean 𝑥̅1 = 28 𝑥̅2 = 23
Standard deviation 𝑠1 = 4 𝑠2 = 2
Sample size 𝑛1 = 30 𝑛2 = 28

31
In this case 𝑠1 ∶ 𝑠2 = 4 ∶ 2 = 2 > 1.5 ➔ the Standard Diveations are DIFFERENT
Step 1: Assuming:
There is NOT different between the mean weight of boys

H0 and girls from 10 to 15-year old.

➔ 𝜇1 − 𝜇2 = 0
There is different between the mean weight of boys and

Ha girls from 10 to 15-year old.

➔ 𝜇1 − 𝜇2 ≠ 0

(𝑥̅1 − 𝑥̅ 2 ) − (𝜇1 − 𝜇2 ) (28 − 23) − 0


Step 2: t-value of test: 𝑡 = = = 6.08
2 2
𝑠 2 𝑠 2 √4 + 2
√ 1 + 1 30 28
𝑛1 𝑛2

Step 3: Degree of freedom (DF) when the standard deviations are DIFFERENT
2
𝑠12 𝑠22 42 22
2
( + ) ( + )
𝑛1 𝑛2 30 28
DF = 2 2 = 2 2 = 43.28 ≈ 43
𝑠12 𝑠22 42 22
(𝑛 ) (𝑛 ) (30) (28)
1 2
+ +
𝑛1 − 1 𝑛2 − 1 30 − 1 28 − 1

𝛼/2 0.025
Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡43 = 2.017
0.025
Step 4: Comparison: t = 6.08 > 𝑡43 = 2.017 ➔ Reject H0
Step 5: Conclusion: Reject H0 (or Accept Ha), which means that there is different between the mean
weight of boys and girls from 10 to 15-year old.

Example 2: Independent samples t-test by using SPSS


This is a statistical test that tests whether there is a statistically significant difference between means
in two statistically unrelated groups. When using SPSS statistical software, analysts use the observed
significance level (Sig.) to accept or reject the initial hypothesis. There are two situations: different
variance and similar variance (Holmes, Illowsky and Dean, n.d.).

32
If the sig. value of Lavene’s test < alpha = 0.05 ➔ Variance is different, in which:
The Sig. value of t-test < 0.05 There is difference between 2 independent
Situation 1 ➔ Reject H0 variables
The Sig. value of t-test > 0.05 There is no difference between 2 independent
➔ Accept H0 variables
If the sig. value of Lavene’s test > alpha = 0.05 ➔ Variance is similar, in which:
The Sig. value of t-test < 0.05 There is difference between 2 independent
Situation 2 ➔ Reject H0 variables
The Sig. value of t-test > 0.05 There is no difference between 2 independent
➔ Accept H0 variables

Situation 1: Testing the difference between the mean of “Total Revenue” of Central region and West
region based on a data set of ABC company at 95% level of confidence (alpha = 0.05).

Assuming:
H0 There is no difference between the mean of Total Revenue of Central and

West region: 𝜇1 − 𝜇2 = 0
Ha There is difference between the mean of Total Revenue of Central and West

region: 𝜇1 − 𝜇2 ≠ 0

There are findings from SPSS:

33
Independent Samples Test
Levene's Test
for Equality of
Variances t-test for Equality of Means

95% Confidence
Interval of the

Sig. (2- Mean Std. Error Difference

F Sig. t df tailed) Difference Difference Lower Upper

TotalRevenue Equal variances


18.011 .000 2.160 48 .036 176.89767 81.88878 12.24924 341.54610
assumed

Equal variances not


2.419 46.260 .020 176.89767 73.11583 29.74544 324.04989
assumed

(Source: attached spv file, 2021)

From the above tables:


• The sig. value of Lavene’s Test = 0.000 < alpha = 0.05 ➔ Variance is different
• The sig. value of t-test (2-tailed) = 0.020 < 0.05 ➔ Reject Ho: There is no difference

between the mean of Total Revenue of Central and West region: μ1 − μ2 = 0

Conclusion: Reject Ho and accept Ha, which means there is difference between the mean of Total
Revenue of Central and West region. In other words, the average of total revenue of Central region is
different from that of West region. It can be seen in the Group statistics table, the mean total revenue
of Central region is 379.962 and of the West region is 203.064.

34
The Means plots of Total revenue variable and Region variable
(Source: attached spv file, 2021)

Situation2: Testing the difference between the mean of “Unit sold” of Central region and West region
based on a data set of ABC company at 95% level of confidence (alpha = 0.05).

Assuming:
H0 There is no difference between the mean of Unit sold of Central and West

region: 𝜇1 − 𝜇2 = 0
Ha There is difference between the mean of Unit sold of Central and West

region: 𝜇1 − 𝜇2 ≠ 0

There are findings from SPSS:

35
Independent Samples Test
Levene's Test
for Equality of
Variances t-test for Equality of Means

95% Confidence Interval of

Sig. (2- Mean Std. Error the Difference

F Sig. t df tailed) Difference Difference Lower Upper

UnitSold Equal variances


.910 .345 .216 48 .830 1.783 8.240 -14.785 18.351
assumed

Equal variances not


.213 38.552 .833 1.783 8.376 -15.166 18.732
assumed

(Source: attached spv file, 2021)

From the above tables:


• The sig. value of Lavene’s Test = 0.345 > alpha = 0.05 ➔ Variance is similar
• The sig. value of t-test (2-tailed) = 0.830 > 0.05 ➔ Accept Ho: There is no difference

between the mean of Unit sold of Central and West region: μ1 − μ2 = 0


Conclusion: Accept Ho and reject Ha, which means there is no difference between the mean of Unit
sold of Central and West region. In other words, the average of unit sold of Central region is not
different from that of West region. It can be seen in the Group statistics table, the mean of unit sold
of Central region is 49.13 and of the West region is 47.35

36
3.3.2 Dependent Samples

❖ Estimation
Situation: The two samples are considered as dependent in conditions: Sample sizes are equal and
each member of the first sample is associated with the corresponding member of the second sample.

Example: Using the following information, let’s estimate the mean difference in scores between team
A and team B at 95% level of confidence (𝜶 = 𝟎. 𝟎𝟓).

Team A 23 20 19 21
Team B 20 21 17 19

Step 1: Diff (𝑋𝐷 ) = Team A – Team B = 3 -1 2 3

Step 2: In this case: 𝑛𝐷 = 4 𝑥̅𝐷 = (3 − 1 + 2 + 3)/4 = 1.75

̅ 2 2 2 2 2
∑(𝑋 − 𝑋) (3 − 1.75) + (−1 − 1.75) + (2 − 1.75) + (3 − 1.75 )
𝑠𝐷 = √ =√ = 1.89
𝑛 −1 4−1

Step 3: Degree of Freedom (DF) = 𝑛𝐷 – 1 = 4 – 1 = 3


𝛼/2
➔ Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡30.025 = 3.182

𝑡𝛼/2 𝑆𝐷 1.89
Step 4: 𝜇𝐷 = 𝑥̅𝐷 ± 𝐷𝐹 × √𝑛𝐷
= 1.75 ± 3.182 × √4
= 1.75 ± 3 = (−1.25 ; 4.75)

Step 5: Conclusion:
The mean difference in scores between team A and team B at 95% level of confidence is in the range
from -1.25 to 4.75 points. In other words, the average difference may be as high as 4.75 points (in
favour of team A) and as low as -1.25 (“-” indicates that it could be in favour of team B).

❖ Hypothesis testing
In statistics, the dependent sample t-test compares the mean value of one sample in various
measurements. Because data from one sample must be paired with measurements from the other,
therefore, it is also known as the paired t-test. Furthermore, this is a statistical process for determining
whether the difference in average between two groups of observations is zero. In particular, when the

37
data or cases in one sample are related to the cases in the other sample, the dependent sample t-
test is utilized (statisticssolutions.com, 2021).
𝑥̅ − 𝜇𝐷
Formula of t-value test: 𝑡 = 𝑆𝐷
√𝑛𝐷

Example 1: Using the following information, let’s test the theory that team A on average score higher
than team B at 95% level of confidence (𝜶 = 𝟎. 𝟎𝟓).

Team A 23 20 19 21
Team B 20 21 17 19

Diff (𝑋𝐷 ) = Team A – Team B = 3 -1 2 3

Step 1: Assuming
Team A on average score less than or equal to
H0 Team B
➔ 𝜇1 − 𝜇2 = 𝜇𝐷 ≤ 0
Team A on average score higher than Team B
Ha
➔ 𝜇1 − 𝜇2 = 𝜇𝐷 > 0

Step 2: In this case: 𝑛𝐷 = 4 𝑥̅𝐷 = (3 − 1 + 2 + 3)/4 = 1.75

̅ 2 2 2 2 2
∑(𝑋 − 𝑋) (3 − 1.75) + (−1 − 1.75) + (2 − 1.75) + (3 − 1.75 )
𝑠𝐷 = √ =√ = 1.89
𝑛 −1 4−1

Step 3: Degree of Freedom (DF) = 𝑛𝐷 – 1 = 4 – 1 = 3


𝛼
➔ Looking at Critical value of t table ➔ 𝑡𝐷𝐹 = 𝑡30.05 = 2.353
𝑥̅ − 𝜇𝐷 1.75 − 0
Step 4: : t-value of test: 𝑡 = 𝑆𝐷 = 1.89 = 1.851
√𝑛𝐷 √4

38
0.05
Step 5: Comparison: t = 1.851 < 𝑡3 = 2.353 ➔ Accept H0

Conclusion: Accept H0 or reject Ha. It means Team A on average score less than or equal to Team B
In other words, there is not enough evidence to support the theory Team A on average score higher
than Team B at the 95% level of confidence.

Example 2: Testing the difference between the mean of the amount of money between Unit Price
variable and Unit Cost variable based on a data set of ABC company at 95% level of confidence (alpha
= 0.05).
Assuming:
H0 There is no difference in the average of the amount of money
between the unit price and unit cost.
➔ 𝜇1 − 𝜇2 = 0
Ha There is difference in the average of the amount of money
between the unit price and unit cost.
➔ 𝜇1 − 𝜇2 ≠ 0

There are findings from SPSS

(Source: attached spv file, 2021)

Based on the Paired Sample Statistics table, the results show that the sample size of “Unit Price” and
“Unit Cost” variables are also 50 observations.

39
(Source: attached spv file, 2021)

From the Paired Samples Correlation table:


• The Sig. value = 0.000 < alpha = 0.05 ➔ Model of correlation of “Unit price” and “Unit cost”
variable is proper.
• The Correlation coefficient = 0.998 ➔ Unit price and Unit cost variables have a positive
relationship.

Paired Samples Test

Paired Differences

95% Confidence Interval of


Std. Std. Error
Mean the Difference Sig. (2-
Deviation Mean
Lower Upper t df tailed)

Pair 1 UnitPrice - UnitCost 8.02460 11.39851 1.61199 4.78518 11.26402 4.978 49 .000

(Source: attached spv file, 2021)

From the Paired Samples Test table above:


• The sig. value (2-tailed) = 0.000 < alpha = 0.05 ➔ Reject H0: There is no difference in the
average of the amount of money between the unit price and unit cost.

Conclusion: Reject H0 or accept Ha. It means there is difference in the average of the amount of money
between the unit price and unit cost.

40
4. Measuring the association between two variables (from the dataset)
4.1 Correlation analysis

Correlation is a bivariate analysis that determines the strength of the relationship between two
variables as well as the direction of the relationship.

Moreover, the value of the correlation coefficient always ranges between +1 and -1 in terms of the
strength of the association. In this scenario, a value of ± 1 revealed that these two variables are
perfectly associated. The relationship between the two variables will become weaker as the
correlation coefficient value approaches zero. Particularly, the sign of the correlation coefficient
reflects the direction of the relationship, for example, a + sign indicates a positive relationship, a –
sign indicates a negative relationship, and a “0” indicates there is no relationship at all between the
variables (statisticssolutions.com, 2021).

(Source: Holmes, Illowsky and Dean, n.d.)

Formula of Correlation coefficient:


∑(𝑥𝑖 − 𝜇𝑥 )(𝑦𝑖 − 𝜇𝑦 )
Population 𝜌=
√∑(𝑥𝑖 − 𝜇𝑥 )2 ∑(𝑦𝑖 − 𝜇𝑦 )2
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
Sample 𝑟=
√∑(𝑥𝑖 − 𝑥̅ )2 ∑(𝑦𝑖 − 𝑦̅)2

(Source: Holmes, Illowsky and Dean, n.d.)

41
In the statistical task of ABC company, the author determined and analyzed the correlation
relationship between the variable "Unit price" and the variables "Unit cost", "Unit sold", "Profit" at
5% significance level (Alpha = 0.05) by using SPSS statistical software.

Correlations
UnitPrice UnitCost UnitSold Profit

UnitPrice Pearson Correlation 1 .998** -.410** .097

Sig. (2-tailed) .000 .003 .503


N 50 50 50 50

UnitCost Pearson Correlation .998** 1 -.417** .052

Sig. (2-tailed) .000 .003 .719

N 50 50 50 50

UnitSold Pearson Correlation -.410** -.417** 1 .357*

Sig. (2-tailed) .003 .003 .011

N 50 50 50 50

Profit Pearson Correlation .097 .052 .357* 1

Sig. (2-tailed) .503 .719 .011

N 50 50 50 50

**. Correlation is significant at the 0.01 level (2-tailed).

*. Correlation is significant at the 0.05 level (2-tailed).


(Source: attached spv file, 2021)

Based on the Correlation table above:


• The sig. Value of “Unit price” variable and “Unit cost” variable = 0.000 < alpha = 0.05 ➔ Having
a statistically significant linear relationship.
• The Pearson Correlation = 0.988 ➔ The direction of the relationship is positive (i.e., Unit price
and Unit cost variables are positively correlated), meaning that these variables tend to increase
together.

• The sig. Value of “Unit price” variable and “Unit sold” variable = 0.003 < alpha = 0.05 ➔ Having
a statistically significant linear relationship.
• The Pearson Correlation = -0.410 ➔ The direction of the relationship is negative (i.e., Unit
price and Unit cost variables are negatively correlated).

42
• The sig. Value of “Unit price” variable and “Profit” variable = 0.503 > alpha = 0.05 ➔ There is
no correlation between these two variables.

4.2 Regression analysis and simple forecasting

Regression analysis is a set of statistical methods for estimating relationships between one or more
independent variables and a dependent variable (typically indicated by Y). In addition, it can be used
to determine the strength of the relationship between variables and to forecast how they will interact
in the future. Normally, in statistics, there are two basic types of regression analysis include simple
linear regression and multiple linear regression (corporatefinanceinstitute.com, 2021).

❖ Simple linear regression model


Simple linear regression is a model that examines the relationship between a dependent variable and
an independent variable. Furthermore, the purpose of this analysis is to forecast the value of a
variable (dependent variable) based on the value of another variable (independent variable)
(corporatefinanceinstitute.com, 2021). The simple linear model is expressed using the following
equation:

Y = a + bX
Where:
• X: the independent variable
• Y: the dependent variable
• a: the constant
• b: the coefficient of X (Holmes, Illowsky and Dean, n.d.).

In statistical task, analyzing the simple linear regression model of an independent variable "Unit Cost"
and a dependent variable "Unit Price" from a data set of ABC company at 95% level of confidence by
using SPSS statistical software.

43
ANOVAb
Model Sum of Squares df Mean Square F Sig.

1 Regression 39675.710 1 39675.710 13169.550 .000a


Residual 144.609 48 3.013

Total 39820.319 49

a. Predictors: (Constant), UnitCost

b. Dependent Variable: UnitPrice


(Source: attached spv file, 2021)
From the above ANOVA table:
The sig. value of F-test = 0.000 < alpha 0.05 ➔ Regression model is statistically meaningful. It means
the simple linear regression model is proper to the data set and is usable.

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients Collinearity Statistics

Model B Std. Error Beta t Sig. Tolerance VIF

1 (Constant) 2.074 .278 7.455 .000

UnitCost 1.656 .014 .998 114.759 .000 1.000 1.000


a. Dependent Variable: UnitPrice
(Source: attached spv file, 2021)
• Depending on the equation of simple linear regression model: Y = aX + b. In this situation “Y”
stands for Unit Price, and “X” stands for Unit Cost.
• From the findings of Coefficients table above:
➔ Y = 2.074 + 1.656*X
➔ Unit Price = 2.074 + 1.656*Unit Cost

❖ Multiple linear regression model


Multiple linear regression analysis is substantially the same as the the simple linear model, with the
exception that numerous independent variables are employed in the analysis
(corporatefinanceinstitute.com, 2021). In addition, multiple linear regression is represented
mathematically as:

Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + u

44
Where:
• Y: the dependent variable
• Xi: the independent variables
• a: the intercept.
• Bi: the coefficients
• U: the regression residual (error) (Holmes, Illowsky and Dean, n.d.).

In findings of multiple linear regression model, some coefficients need to be considered:


• R-square: To evaluate the scatter of the data points around the fitted regression line.
• Adjusted R Square: to reflect the influence of the independent variables on the dependent
variable.
• Durbin-Watson (DW): to test the autocorrelation of adjacent errors with values ranging from
0 to 4.
• If the value is from 1.5 to 2.5 ➔ no autocorrelation
• If the value near 0 ➔ proportional autocorrelation
• If the value near 4 ➔ inverse autocorrelation
• Significance of F-test: to test whether the multiple linear regression model is generalizable and
applicable to the population or not.
• Variance inflation factor (VIF): to test multi-collinearity. If VIF of an independent variable is
greater than 10 ➔ multi-collinearity exists.
• Significance of t- test: to test the meaning of regression value to consider whether
independent variables influence on dependent variable or not.
• Standardized Coefficients β: to examine whether the influence level of independent variables
on dependent variables is strong or weak (Holmes, Illowsky and Dean, n.d.).

Analyzing the multiple linear regression model of independent variables "Unit Sold", "Total Revenue",
"Total Cost" and a dependent variable "Profit" from a data set of ABC company at 95% level of
confidence by using SPSS statistical software.

45
Assuming: H0: 𝑅 2 = 0 ➔ The model does not exist
Ha: 𝑅 2 ≠ 0 ➔ The model does exist

Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate Durbin-Watson

1 .828a .685 .664 133.99966 1.849


a. Predictors: (Constant), TotalCost, TotalRevenue, UnitSold

b. Dependent Variable: Profit


(Source: attached spv file, 2021)
Based on the Model Summary table above:
• R-square value = 0.685 ≠ 0 ➔ H0 is rejected, which means the model does exist.
• Adjusted R square value = 0.664 ➔ it means that 3 independent variables influence 66.4% of
the change of dependent variable "Profit", the remaining 33.6% is due to out-of-model
variables and random error.
• Durbin-Watson value = 1.849, in the range from 1.5 to 2.5 ➔ there is no autocorrelation
between independent variables.

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 1794764.587 3 598254.862 33.318 .000a


Residual 825971.810 46 17955.909

Total 2620736.398 49

a. Predictors: (Constant), TotalCost, TotalRevenue, UnitSold

b. Dependent Variable: Profit


(Source: attached spv file, 2021)

The findings on ANOVA table above showed that


• Sig. value of F-test = 0.000 < α = 0.05 ➔ the regression model is statistically significant. Thus,
the multiple linear regression model is proper to analyze.

46
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients Collinearity Statistics

Model B Std. Error Beta t Sig. Tolerance VIF

1 (Constant) 5.701 42.488 .134 .894

UnitSold .256 .742 .031 .346 .731 .834 1.199


TotalRevenue .057 .067 .072 .845 .403 .946 1.057
TotalCost .861 .101 .795 8.553 .000 .793 1.262
a. Dependent Variable: Profit
(Source: attached spv file, 2021)

From the Coefficients table above:


• None independent variables have VIF value > 10 ➔ there is no multi-collinearity between the
above independent variables.

• Sig. value of t-test of Unit Sold = 0.731 > α = 0.05, which means independent variable “Unit
Sold” does NOT influence on dependent variable “Profit”.
• Sig. value of t-test of Total Revenue = 0.403 > α = 0.05, which means independent variable
“Total Revenue” does NOT influence on dependent variable “Profit”.
• Sig. value of t-test of Total cost = 0.000 < α = 0.05, which means independent variable “Total
cost” influences on dependent variable “Profit”.

Unstandardized coefficients are to create regression equation: y = a + b𝑥1 + c𝑥2 + d𝑥3


Y: Profit 𝑥1 : Unit sold 𝑥2 : Total revenue 𝑥3 : Total cost
a = 5.701 b = 0.256 c = 0.057 d = 0.861

➔ Therefore, the regression equation is formed:

Y = 5.701 + 0.256𝑥1 + 0.057𝑥2 + 0.861𝑥3


• If all factors unchanged, when Unit Sold rises by 1 unit, Profit will increase by “y = 5.701 +
0.256*1 = 5.957 ”.

47
• If all factors unchanged, when Total Revenue rises by 1 unit, Profit will increase by “y = 5.701
+ 0.057*1 = 5.758”.
• If all factors unchanged, when Total Cost rises by 1 unit, Profit will increase by “y = 5.701 +
0.861*1 = 6.562”.

Standardized Coefficients: is better for presenting suggestions, and a standardized regression


equation is created in decreasing order of independent variable influence level.

Profit = 0.031*Unit Sold + 0.072*Total Revenue + 0.795*Total Cost


• If all factors unchanged, when Unit Sold rises by 1 unit, Profit will increase by “y = 0.031* 1 =
0.031”.
• If all factors unchanged, when Total Reveneu rises by 1 unit, Profit will increase by “y = 0.072*
1 = 0.072”.
• If all factors unchanged, when Total Cost rises by 1 unit, Profit will increase by “y = 0.795* 1 =
0.795”.

(Source: attached spv file, 2021)

48
Based on the histogram with a normal curve above, mean = 1.47E * 10−16 comes to 0 and standard
deviation = 0.969, therefore it can be concluded that the residual has standard distribution.

Linear relation between independent and dependent variables

(Source: attached spv file, 2021)

Based on the scatter plot above, the finding showed that the scatters distribute randomly and gather
around 0 axes, therefore, it led to conclusion that the linear relationship between independent and
dependent variables is not infringed.

5. The differences in application among descriptive, exploratory, and confirmatory analysis


techniques in general.

Descriptive, inferential, and confirmatory statistics are the three basic categories that all statistical
approaches belong to. Therefore, the contrasts between these three will be discussed in this section,
as well as how they affect the field of data analytics.

49
Descriptive statistics Exploratory statistics Confirmatory statistics

The science of quantitatively A method of analyzing databases A method is used to evaluate


main to examine previously unknown evidence by testing the
describing the
characteristics of a set of data. relationships. Moreover, assumptions about the data. It
In essence, it describes the exploratory models are suitable is similar to examining evidence
features of populations or for findings new connections and asking witnesses in a court to
decide the defendant's guilt or
samples (datascientistinsights.com,
innocence (online.ndm.edu,
(datascientistinsights.com, 2021).
2018).
2021).

Descriptive statistics can also Exploratory analyses can also be An important factor of

report and present generally used to define future research confirmatory data analysis is
quantifying aspects such as the
consist of summary data tables, issues or investigations.
graphics, and text to explain Furthermore, it is rarely the extent to which any deviation
what the charts and tables are complete and definitive answer from the model that analysts
displaying to the question at hand, but have established could have
(datascientistinsights.com, rather the beginning. By using occurred by chance and when
they should begin doubting their
2021). exploratory analysis, data
analysts can look for clues and model (sisense.com, 2021).
trends that will help them
conclude (online.ndm.edu,
2018).

Measures of central tendency Exploratory analysis includes a Testing hypotheses, creating


(mean, median, mode) and variety of tasks, such as finding estimates, regression analysis
measures of variability errors and missingdata, (estimating the connection
(variance, standard deviation, identifying important variables between variables), and variance
analysis (evaluating the
etc.) that are commonly used in the data set, testing a
in descriptive statistics hypothesis connected to a difference between the planned
specific model, and developing a and actual findings) are typical

50
(mymarketresearchmethods.c model that can describe the data tools of confirmatory analysis
om, 2020). in the most concise manner (online.ndm.edu, 2018).
possible (online.ndm.edu, 2018).

III. Applying a range of statistical methods used in business planning for quality, inventory, and
capacity management
1. Measuring the variability in business processes or quality management

A measure of variability is a summary statistic that indicates the amount of distribution in a data
collection. Normally, in statistics, there are three typical measures of variability including range,
variance, and standard deviation (as discussed in previous sections). In particular, a low distribution
presents that the data points tend to be group tightly around the center. In contrast, a high dispersion
indicates that these points tend to fall further apart (Frost, 2021).

All production and measuring procedures in the business world are subject to fluctuation, even over
time. As a result, the significance of variation in quality management cannot be overstated. In
particular, the percentage of deviation is lower when the variance is limited, and conversely.
Therefore, if the level of deviation is maintained to a minimum, quality management can be easier
and more convenient, and the consequence would be highly advantageous. So, the probability
distributions will be examined in-depth in the next part to demonstrate the importance of variation in
quality management correlated with manufacturing distributions.

2. Measuring the probability by using probability distributions to business operations and


processes

A probability distribution is a tool that measures the chance of obtaining a random variable's distinct
outcomes. To put it another way, the variable's values are determined by the underlying probability
distribution (Frost, 2021). There are many various classifications of probability distributions. Some of
them such as the Normal distribution, Poisson distribution, and Binomial distribution will be clarified
in detail below.

51
2.1 Normal distribution

The normal distribution, commonly called the Gaussian distribution, is symmetrical probability
distribution centered on the mean, indicating that data around the mean arise more frequently than
data far from it. In which, the normal distribution will display as a bell curve on a graph
(investopedia.com, 2021). Among that, the bell curve is symmetrical, in which half of the data fall to
the left of the mean and the rest half fall to the right. Furthermore, the highest peak on the normal
curve normally can be the mean, as well as the median and mode of the distribution
(statisticshowto.com, n.d.).

A normal distributionn
(Source: statisticshowto.com, n.d.)

Aside from that, the standard deviation regulates the normal distribution's spread. In which, a lower
standard deviation suggests that the data is firmly packed around the mean, resulting in a higher
normal curve. On an other hand, a larger standard deviation implies that the data is distributed out
around the mean, causing the normal distribution to be flatter and wilier (statisticshowto.com, n.d.).

52
(Source: Holmes, A., Illowsky, B. and Dean, S., n.d)

The percentage of values in some commonly used intervals are:


• 68.3% of the values of a normal random variable are within plus or minus one standard
deviation of its mean.
• 95.4% of the values of a normal random variable are within plus or minus two standard
deviations of its mean.
• 99.7% of the values of a normal random variable are within plus or minus three standard
deviations of its mean.

A standard normal probability distribution is a random variable that has a normal distribution with an
average of 0 and a standard deviation of 1. With the following formula, the letter z is widely used to
represent this specific normal random variable:
𝑋 − 𝜇
𝑧=
𝜎
(Source: Holmes, A., Illowsky, B. and Dean, S., n.d)

53
Example 1: The mean weight of a product is normally distributed with 73,5g and the standard
deviation is 10,5g. Let's calculate the probability of X ≤ 80g?

𝑋−𝜇 80 − 73.5
Step 1: 𝑧 = = = 0.6190
𝜎 10.5

Step 2: Looking up z distribution table: P (X ≤ 80) = P (z < 0.6190) = 0.7291 = 72.91%

Step 3: Conclusion: The probability of X ≤ 80g is 72.91%

Example 2: The mean of the distribution that client’s expenditure is $28, with an average standard
deviation of $7.
• Calculate the probability that a randomly selected client spends less than $ 33?
• Calculate the probability that a randomly selected client spends between $13 and $33?
• Calculate the probability that a randomly selected client spends more than $9?
• Calculate the $ amount such that 72% of all clients spending no more than this?
By using Excel.

(Source: Attached excel file, 2021)

54
2.2 Poisson distribution

A Poisson distribution is one of probability distribution types in statistics which can be utilized to
demonstrate how many times that an event is expected to occur during a fixed interval of time. To put
it another way, it is a count distribution. In addition, poisson distributions are frequently used to
comprehend independent events that happen at a steady rate during a specific time period
(statisticshowto.com, 2021).
There are some example graphs of Poisson distributions:

(Source: statisticshowto.com, 2021)

(Source: sciencedirect.com, 2021)

55
On the horizontal axis, Poisson distributions are only valid for integers. Furthermore, λ is the predicted
frequency of event occurrences (sometimes written as μ) (statisticshowto.com, 2021).

2.3 Binomial distribution

The binomial distribution is a probability distribution that expresses the chance that a value will
acquire one among two independent values based on a given set of parameters or hypotheses.
Furthermore, the binomial distribution's underlying assumptions are that each trial has just one
finding. In which, each trial has the same likelihood of success, and that each trial is mutually exclusive,
or independent of the others (investopedia.com, 2020).

Probability function

In general, the binomial distribution, as contrasted to a continuous distribution such as the normal
distribution, is a typical discrete distribution used in statistics. Because the binomial distribution only
includes two sides, 1 (for a success) and 0 (for a failure) for referring to the number of trials in the
data. Therefore, the binomial distribution shows the probability for x successes in n trials, and given a
success probability p for each trial (investopedia.com, 2020).

56
Besides that, binomial distributions must also response the following three conditions:
• The number of observations or trials is fixed.
• Each observation or trial is distinct from the others. To put it another way, none of trials have
any bearing on the likelihood of the next trial.
• From one trial to the next, the likelihood of success (tails, heads, fail, or pass) is similar
(statisticshowto.com, 2021).

2.4 Inference
Statistical inference is the process of inferring attributes of a fundamental probability distribution
through data analysis. Furthermore, inferential statistical analysis indicates population features, such
as through hypothesis testing and generating estimates. Therefore, it is concluded that the observed
data set is thought to be drawn from a wider population (Upton, G. and Cook, I. 2008).

Apart from that, inferential statistics and descriptive statistics can be compared. Descriptive statistics
is only interested in the properties of the database, and it does not infer that the data was obtained
from a wider population. Besides that, in the context of machine learning, the phrase inference is
mostly used to describe "make an estimation by analysing an already classification classifier." In this
framework inferring concept, properties are stood for as training or learning (rather than inference)
and using a model for prediction is stood for as inference (instead of forecasting)
(courses.lumenlearning.com, n.d.).

3. Evaluations and recommendations for improving business planning through statistical


methods above.
Evaluations: After clarifying, the above probability distributions can be evaluated that are highly
valuable statistical tools for estimating the performance of risk disguised in business operations. Many
business stakeholders can utilize the probability distributions to forecast the most likely results and
make better decisions. Besides that, probability distributions can also be utilized to develop
organizational scenario assessments. Particularly, a scenario analysis employs probability distributions
to generate numerous theoretically different outcomes for a given course of action or future
occurrence. In addition to developing organizational scenario assessments, a probability distribution
can be a useful tool for evaluating risk (smallbusiness.chron.com, n.d.).

57
For example, consider a corporation that is considering expanding into a new market. If this company
has to make $500,000 in revenue to break even, and their probability distribution indicates that there
is a 10% chance that revenues would be less than $500,000, the corporation can estimate the risk level
will face if it pursues that new business market (smallbusiness.chron.com, n.d.).

Recomendations: In business, one practical application of probability distributions and scenario


analysis is to forecast future sales volumes. In fact, although it is nearly difficult to predict the exact
amount of a future sales quantity, companies must be prepared to plan for unexpected events. In
particular, applying a scenario analysis based on probability distributions can assist a corporation to
frame its potential future values in terms of a likely sales level, and a worst-situation and best-situation
scenario. By doing so, the organizations can improve their business plans and strategies according to
the probable scenario but still be aware of the other alternatives (smallbusiness.chron.com, n.d.).

IV. Using appropriate charts and tables to communicate findings of given variables
1. Analysing data and interpreting results by using frequency distribution tables, graphs,
and charts
Normally, in statistics, the data analysis findings are represented under types of tables and charts that
will be mentioned below:

❖ Frequency table
A frequency table is a table that lists items and presents the number of times the items arise.
Additionally, frequency tables can be utilized to describe the number of times of a specific type of
element within a database. Moreover, frequency tables often known as frequency distribution tables,
are one of the most fundamental methods for illustrating descriptive statistics.
For example:

58
Frequency table of “Unit Price” variable
(Source: attached spv file, 2021)
Base on the frequency table above, analysts can easily know how often the price levels belonging to
the "Unit Price" variable appear in the data set.

❖ Pie chart
A pie chart is also known as a circle chart and is a circular statistical visual that is separated into sectors
or segments to demonstrate numerical problems (byjus.com, 2021). Each sector represents a
proportionate percentage of the overall picture. In addition, the best method for determining the
composition of anything is to use a pie chart. Pie charts frequently replace other graphs such as bar
graphs, line plots, histograms, and so on in some cases (byjus.com, 2021).

PIE CHART OF ITEMS


Pen
8% Binder
18%
Pen Set
14%

Eraser
10%
Pencil
24%
Brush set
6%
Desk
Note Sticker Ruler
6%
12% 2%

(Source: Attached excel file, 2021)

59
With data of "Item" variable from ABC company database, looking at the below pie chart, it is very
easy to realize how many items the company is currently distributing, and the chart also shows the
percentage of each item account for.

❖ Bar chart
A bar chart is a visual representation of category data (continuous data can be made categorical by
auto-binning). Furthermore, the bar chart illustrates data through a series of bars, each reflects a
particular classification. In which, vertical columns, horizontal bars, and comparative bars (several bars
to indicate a comparison between values), and stacked bars are all options for bar charts. The height
or length of each bar corresponds to a certain collection (investopedia.com, 2021).

Unit Price
14 13
12
12

10
Frequency

8 7
6
6 5

4 3 3

2 1

0
1.29 1.99 2.99 4.99 8.99 15.99 19.99 125
Unit Price

The bar chart of “Unit price” variable


(Source: attached excel file, 2021)
The bar chart above presented the information of the "Unit price" variable from the ABC company
data set in a frequency distribution. Therefore, the frequency of multiple categories also was clarified,
and the data set of 50 observations were made clear in visual form.

❖ Histogram with normal curve


A histogram is a graphical representation that divides a dataset into categories defined by the user.
Aside from that, the histogram, which resembles a bar chart in appearance, compresses a data series
into an easily comprehended visual by arranging many data points into reasonable ranges or bins
(investopedia.com, 2021).

60
In this method, we basically generate a histogram and then attach a normal curve to it, whose area
under the curve is equal to the histogram's area. To evaluate if data is normally distributed, a
histogram and a normal curve can be used together.
For example

(Source: attached spv file, 2021)


The histogram above examines whether or not the residual has a standard distribution. The
distribution curve has a bell shape when the mean is close to zero and the standard deviation is near 1.
So, it can be concluded that the distribution is approximately standardized, hence the residual
standard distribution assumption is not infringed.

❖ Scatter plots
In terms of a scatter plot, this is a chart type that is typically used to illustrate and visually present the
relationship between variables. The variables' values are indicated by dots. In addition, the placement
of the dots on the vertical and horizontal axis will show the value of the respective data point.
Therefore, scatter plots employ Cartesian coordinates to represent the values of the variables in a
data set. Moreover, scatter plots are also known as scattergrams, scatter graphs, or scatter charts
(corporatefinanceinstitute.com, n.d.).

61
(Source: attached spv file, 2021)

Based on the scatter plot in the section of Multiple linear regression analysis, the finding showed that
the scatters distribute randomly and gather around 0 axes, therefore, it led to conclusion that the
linear relationship between independent and dependent variables is not infringed.

2. The strengths and weaknesses of using different types of charts and tables

Types Strengths Weaknesses


• Simple to understand and read • Difficult to interpret complex data
the outcomes. sets that are presented on a
• Can help to identify obvious frequency table.
Frequency tables trends within a data set. • Require additional written or
• Can be used to compare data verbal description and can be
between data sets of the same easily twisted to create false
type. perception.
• Present relative proportions of • Outcomes are reported in terms
Pie chart
multiple categories of data. of percentages or ratios therefore

62
• The circle's size can be made it is difficult to determine the
proportional to the quantity it exact value expressed.
represents. • A pie chart will be less effective if
• Easy to create and comprehend it presented too many pieces of
because of their extensive use in data.
business and descriptive • Adding data labels and numbers
statistics. to the pie chart may not
• Permit a visual check of the be helpful because it will become
reasonableness or accuracy of the tangled and difficult to read
calculation (bizfluent.com, 2018). (bizfluent.com, 2018).
• Present relative numbers or • Fail to express main assumptions,
proportions of many categories in causes, impacts, and patterns.
the data set. • Additional explanation is
• Summarize a significant quantity frequently required.
of information from the data set • Can be easily manipulated to
in a visual, easy-to-understand bring wrong perceptions
Bar chart
format. (geographypoint.com, 2021).

• Make trends easier to highlight


than tables do
• Prediction can be made exactly
and quickly
(geographypoint.com, 2021).
• Because the intervals are always • Histograms are used only for
equal, histograms provide a more numerical data.
concrete kind of consistency, • Unless the histogram is a
which is a characteristic that frequency histogram, extracting
Histogram facilitates data transfer from the exact amount of "input" is
frequency tables to histograms exceedingly difficult and almost
(histogramsdennard.weebly.com, impossible
2021). (histogramsdennard.weebly.com,
2021)

63
• Can graph large data sets easily
with histograms
• Excellent at displaying financial • These graphs are unable to
data. determine the exact degree of
• Scatterplots are invaluable and correlation.
use with almost any continuous • It is not a quantitative measure of
Scatter plots
scale data. the variables' relationship. It
• It is simple to comprehend and is only a numerical representation
interpreted of a numerical change
(preservearticles.com, n.d.). (preservearticles.com, n.d.).

3. The most effective way of communicating the results of the analysis


From the data set of ABC company, depending on the requirements of each data analysis section,
charts and graphs will be used for different purposes. There is no single form of visualization that can
most effectively communicate the results of the analysis as a whole. Instead of just using a certain
type of graph, analysts will use a variety of tools to best visualize and illustrate data.

For example, for descriptive statistics analysis, frequency tables and bar charts would be appropriate
to present the results, as they highlight the trends and frequencies of the data in a clear, simple, and
understandable way.

Besides that, for the analysis of the relationship between variables in a typical data set such as
correlation and regression analysis, scatter plots may be the most suitable tool to present the results.
Because scatter plots consist of multiple data points plotted on two axes. Each variable is depicted in
a scatter plot will have multiple observations, mainly used for the analysis of correlation and
distribution, and utilized to determine whether the relationship between two different variables is
correlated. At the same time, scatter plots are often used to create predictive models and statistical
decision-making in enterprises. From here, it will be easier for analysts to understand the meaning of
the results and can draw conclusions more accurately.

64
Also for the probability distribution analysis, I consider the histogram with a normal curve to be a
suitable tool for illustrating the results. Based on the histogram combined with the normal curve, the
analyst has accurate conclusions about the normal or abnormal distribution situation of the quality
indicator or the process that related to the data set. From there, they can make appropriate decisions
to improve and improve quality for the future.

V. Conclusion
To conclude, this study analyzed and evaluated raw business data by utilizing a variety of statistical
approaches, and presented the outcomes in the form of appropriate tables and charts. Additionally,
the distinctions in implementation between descriptive, exploratory, and confirmatory analysis of
business and economic data were analyzed before legitimate suggestions and assessments for
enhancing company planning through statistical approaches were made. Thereby, I can better
understand the nature of the statistics for management and perform data analysis by using the above
statistical methods through SPSS software. From those findings, I have enough pieces of evidence and
data to support managers of ABC company make business decisions and new strategies more
effectively.

65
VI. References
statistics.laerd.com, 2021. Mean, Mode and Median - Measures of Central Tendency - When to use
with Different Types of Variable and Skewed Distributions | Laerd Statistics. [online]
Statistics.laerd.com. Available at: https://statistics.laerd.com/statistical-guides/measures-central-
tendency-mean-mode-median.php [Accessed 20 June 2021].

byjus.com, 2021. Central Tendency Definition | Measures of Central Tendency & Examples. [online]
BYJUS. Available at: https://byjus.com/maths/central-tendency/ [Accessed 20 June 2021].

onlinestatbook.com, n.d. Measures of Variability. [online] Onlinestatbook.com. Available at:


https://onlinestatbook.com/2/summarizing_distributions/variability.html [Accessed 21 June 2021].

scalestatistics.com, 2021. Standard Deviation Gives Context to Where Observations Fall in a


Distribution. [online] Statistician For Hire. Available at: https://www.scalestatistics.com/standard-
deviation.html [Accessed 21 June 2021].

statisticshowto.com, 2021. Inferential Statistics: Definition, Uses. [online] Statistics How To. Available
at: https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/inferential-
statistics/ [Accessed 21 June 2021].

Surbhi, S., 2017. Difference Between Population and Sample (with Comparison Chart) - Key
Differences. [online] Key Differences. Available at: https://keydifferences.com/difference-between-
population-and-sample.html [Accessed 21 June 2021].

stattrek.com, 2021. One-Sample t-Test: Definition. [online] Stattrek.com. Available at:


https://stattrek.com/statistics/dictionary.aspx?definition=one-sample%20t-test [Accessed 25 June
2021].

nedarc.org, 2019. NEDARC - Hypothesis Testing. [online] Nedarc.org. Available at:


https://www.nedarc.org/statisticalhelp/advancedstatisticaltopics/hypothesisTesting.html [Accessed
27 June 2021].

66
statisticssolutions.com, 2021. Independent Sample T-Test - Statistics Solutions. [online] Statistics
Solutions. Available at: https://www.statisticssolutions.com/independent-sample-t-test/ [Accessed 2
July 2021].
statisticssolutions.com, 2021. Correlation (Pearson, Kendall, Spearman) - Statistics Solutions. [online]
Statistics Solutions. Available at: https://www.statisticssolutions.com/free-resources/directory-of-
statistical-analyses/correlation-pearson-kendall-spearman/ [Accessed 6 July 2021].

corporatefinanceinstitute.com, 2021. Regression Analysis - Formulas, Explanation, Examples and


Definitions. [online] Corporate Finance Institute. Available at:
https://corporatefinanceinstitute.com/resources/knowledge/finance/regression-analysis/ [Accessed
6 July 2021].

datascientistinsights.com, 2021. Six Types Of Analyses Every Data Scientist Should Know - Data
Scientist Insights. [online] Data Scientist Insights. Available at:
https://datascientistinsights.com/2013/01/29/six-types-of-analyses-every-data-scientist-should-
know/ [Accessed 7 July 2021].

mymarketresearchmethods.com, 2020. Descriptive vs. Inferential Statistics Difference. [online] My


Market Research Methods. Available at: https://www.mymarketresearchmethods.com/descriptive-
inferential-statistics-difference/ [Accessed 7 July 2021].

online.ndm.edu, 2018. Exploratory Analysis vs. Confirmatory Analysis. [online] NDMU Online.
Available at: https://online.ndm.edu/news/analytics/exploratory-analysis-vs-confirmatory-analysis/
[Accessed 7 July 2021].

sisense.com, 2021. Exploratory and Confirmatory Analysis: What’s the Difference? l Sisense. [online]
Sisense. Available at: https://www.sisense.com/blog/exploratory-confirmatory-analysis-
whatsdifference/ [Accessed 7 June 2021].

Frost, J., 2021. Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation
- Statistics By Jim. [online] Statistics By Jim. Available at: https://statisticsbyjim.com/basics/variability-
range-interquartile-variance-standard-deviation/ [Accessed 7 July 2021].

67
Frost, J., 2021. Understanding Probability Distributions - Statistics By Jim. [online] Statistics By Jim.
Available at: https://statisticsbyjim.com/basics/probability-distributions/ [Accessed 7 July 2021].
investopedia.com, 2021. Normal Distribution. [online] Investopedia. Available at:
https://www.investopedia.com/terms/n/normaldistribution.asp [Accessed 7 July 2021].

statisticshowto.com, 2021. Poisson Distribution / Poisson Curve: Simple Definition. [online] Statistics
How To. Available at: https://www.statisticshowto.com/poisson-distribution/ [Accessed 7 July 2021].

sciencedirect.com, 2021. Poisson Distribution - an overview | ScienceDirect Topics. [online]


Sciencedirect.com. Available at: https://www.sciencedirect.com/topics/mathematics/poisson-
distribution [Accessed 7 July 2021].

investopedia.com, 2020. How Binomial Distribution Works. [online] Investopedia. Available at:
https://www.investopedia.com/terms/b/binomialdistribution.asp [Accessed 7 July 2021].

statisticshowto.com, 2021. Binomial Distribution: Formula, What it is, How to use it. [online] Statistics
How To. Available at: https://www.statisticshowto.com/probability-and-statistics/binomial-
theorem/binomial-distribution-formula/ [Accessed 7 July 2021].

Upton, G. and Cook, I. 2008. Oxford Dictionary of Statistics, OUP.

courses.lumenlearning.com, n.d. Why It Matters: Linking Probability to Statistical Inference | Concepts


in Statistics. [online] Courses.lumenlearning.com. Available at:
https://courses.lumenlearning.com/wm-concepts-statistics/chapter/wim-linking-probability-to-
statistical-inference/ [Accessed 8 July 2021].

smallbusiness.chron.com, n.d. The Role of Probability Distribution in Business Management. [online]


Small Business - Chron.com. Available at: https://smallbusiness.chron.com/role-probability-
distribution-business-management-26268.html [Accessed 8 July 2021].

byjus.com, 2021. Pie Chart (Definition, Formula, Examples) | Making a Pie Chart. [online] BYJUS.
Available at: https://byjus.com/maths/pie-chart/ [Accessed 8 July 2021].

68
investopedia.com, 2021. Bar Graph Definition and Examples. [online] Investopedia. Available at:
https://www.investopedia.com/terms/b/bar-graph.asp [Accessed 8 July 2021].

investopedia.com, 2021. Histogram Definition. [online] Investopedia. Available at:


https://www.investopedia.com/terms/h/histogram.asp [Accessed 8 July 2021].

corporatefinanceinstitute.com, n.d. Scatter Plot - Overview, Applications, How To Create. [online]


Corporate Finance Institute. Available at:
https://corporatefinanceinstitute.com/resources/knowledge/other/scatter-plot/ [Accessed 8 July
2021].
bizfluent.com, 2018. Advantages & Disadvantages of a Pie Chart. [online] Bizfluent. Available at:
https://bizfluent.com/list-6715678-advantages-disadvantages-pie-chart.html [Accessed 8 July 2021].

histogramsdennard.weebly.com, 2021. Pros and Cons of Histograms. [online] The Histogram.


Available at: https://histogramsdennard.weebly.com/pros-and-cons-of-histograms.html [Accessed 8
July 2021].

preservearticles.com, n.d. What are the Merits and Demerits of Scatter Diagram?. [online]
PreserveArticles.com: Preserving Your Articles for Eternity. Available at:
https://www.preservearticles.com/articles/what-are-the-merits-and-demerits-of-scatter-
diagram/7724 [Accessed 8 July 2021].

Holmes, A., Illowsky, B. and Dean, S., n.d. Introductory business statistics. Houston, Texas: OpenStax.

69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy