Asm1 570
Asm1 570
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I understand that
making a false declaration is a form of malpractice.
Grading grid
P1 P2 P3 M1 M2 D1
Summative Feedback: Resubmission Feedback:
2.3. Describe in detail the way of collecting databases and the transformation process of
data into information and knowledge ..................................................................................6
V. Conclusion.......................................................................................................................14
REFERENCES ....................................................................................................................15
I. Introduction
SSI Securities Corporation (SSI – HOSE) was started operating in December 1999; after 21 years, this
corporation has become a leading financial institution in Vietnam's market financial, with the fastest
development in charter capital with 1500 times. The writer is a Research Analyst of SSI Securities
Corporation. The corporation is planning to make business reports of some Vietnamese
companies/industries by applying several statistical methods. The objectives of this study are to evaluate
and analyze business data and give knowledge from these data. The report structure will follow three sectors:
classifying data from a range of different published sources, evaluating data using different methods of
analysis, and applying it to the dataset.
Information: is a group of data that collectively carries a logical meaning, it is created when data are
processed, organised, or structured to provide context and meaning (Indeed Editorial Team, 2023).
Knowledge: is unique to each individual and is the accumulation of past experience and insight that shapes
the lens by which we interpret, and assign meaning to information (Hilpinen, 1970).
2.2. Application
Firm size/ Classification Red River Delta Grand Total
Do not answer 1 1
Micro 58 58
Small 182 182
Medium 30 30
Large 31 31
Grand Total 302 302
Red River Delta
200
182
180
160
140
120
100
80 Red River Delta
58
60
40 30 31
20 1
0
Do not Micro Small Medium Large
answer
The bar chart shows the firm’s classification by the number of employees of the entire enterprise in the Red
River Delta.
Information:
Knowledge: Small companies make up the most significant number of organizations, while the number of
micro, medium, and large companies takes up less than half of the number of small companies. There is 1
business that did not respond to this survey.
2.3. Describe in detail the way of collecting databases and the transformation process of data
into information and knowledge
The writer was assigned by the tutor with the topic Red River Delta, Code a2=1 and Number of firms are
302.
In Excel, with the original data table, the writer proceeds to this code a2=1 data by using Filter. After
filtering the dataset of the Red River Delta, the writer used the following table (Decree 39/2018/ND-CP) to
classify firms into five groups:
The author created 2 more columns: Size and Code, and then classified all businesses in the Red River Delta
by number of employees using the VLOOKUP function.
By highlighting the data and using a Pivot Table, the author has drawn a cross table with the row being the
company size and the column being the area (a2) to show the number of companies in each category. Next
is to use the data of the cross table above to draw a column chart showing the classification of enterprises
by number of employees.
After completing the chart, the writer conducted data analysis as follows: the numbers will be the raw data,
from that raw data the writer knew how many businesses are in scale: Micro, Small, Medium or Large based
on the number of employees. Through the steps of data analysis and understanding basic information, the
writer draws knowledge from the above chart such as the difference between the sizes of businesses and
there is 1 business that did not respond to this survey.
III. Three methods of analysis
3.1. Descriptive approach
Theory
According to Cavallaro & Fidell (1994), there are two parts in descriptive statistics structures: central
tendency and dispersion.
In central tendency, it concentrates on the single number, which performs the best in other numbers, and
it contains three main variances: mean, median, and mode (Cavallaro & Fidell, 1994).
Firstly, "mean" is like the average, and the value of the mean is adding all the numbers in the number
sequence and dividing the sum by the number of numbers (Cavallaro & Fidell, 1994, p. 140).
Secondly, "median" is the middle number in the number sequence (Cavallaro & Fidell, 1994, p. 141). For
example, from 11 to 15, 13 is the middle number in these figures.
Finally, "mode" is the most frequently occurring number’ (Cavallaro & Fidell, 1994, p. 141).
In dispersion, it focuses on the difference between numbers in the distribution with each other.
Firstly, the range is a measure between the highest and lowest numbers in a set of numbers (Fisher &
Marshall, 2009).
Secondly, the standard deviation is the average difference of each score to the mean, is the square root of
the variance (Fisher & Marshall, 2009).
Thirdly, the geometric mean is determined by multiplying the product of a series of values by the inverse
of the series' entire length (Blokhin, 2021).
Finally, the percentiles is percentage, which shows the data from the lowest to highest in a time (Cavallaro
& Fidell, 1994, p. 151).
Common statistical terms and their definitions (Fisher & Marshall, 2009)
Application
Number of monthly active Facebook users worldwide as of 4th quarter 2022 (in millions) (Statista, 2023)
The reported financial figures are the product of descriptive analysis. The following example is the number
of monthly active Facebook users worldwide as of Q4 2022 (in millions).
Facebook has measured monthly active users, which are users who have logged in in the last 30 days. This
number increased sharply from the third quarter of 2008 to the fourth quarter of 2022, which shows that the
advertising and application strategy is working properly and effectively.
In this type, there are several advantages. The first advantage is a high degree of objectivity and neutrality
because this method needs to collect many numbers of variables or a single number of variables to conduct
a descriptive study (Baha, 2016, p. 08). Next, the second advantage of this method is supporting researchers
understand the relationship between variances easier (Baha, 2016, p. 08). Additionally, descriptive statistics
can use for both qualitative and quantitative data to describe the characteristics of the population (Baha,
2016, p. 08).
However, this method also has some disadvantages. The first disadvantage is the truth of the participants
(Baha, 2016, p. 09). For example, some people do not answer truly, so this action can create errors in data
research (Baha, 2016, p. 09). Moreover, another disadvantage is the small scale. Since descriptive statistics
can use for small scale on a sample, this method only can prove some situations, which are relevant to that
sample (Baha, 2016, p. 09). In addition, the number of participants or the amount of data is also an issue
because too small participants or stats are not enough for the research (Baha, 2016, p. 10). Finally, another
issue is external factors because this method can only demonstrate the relationship between two variables
in the figure and not mention other external aspects which can affect that research (Baha, 2016, p. 10).
Confirmatory analysis is a process used to evaluate evidence by changeling hypotheses about the data
(Miksza & Elpus, 2018). This part of this method is a place to check the researcher’s conclusion and
calculate the value of their work; in addition, it is relevant to some process as hypothesis checking,
producing estimates, or regression analysis (Miksza & Elpus, 2018).
Application
Confirmatory Factor Analysis (CFA) (Chegg, 2011)
First of all, the Pearson correlation is 0,153, which means the relationship between the crime rate and
unemployment is weak.
The second thing is about the significance of whether the relationship between the crime rate and
unemployment is significant or not. As a rule of thumb,
In this case, the sig-tail (0,288) is more than 0.05, therefore, we are not 95% confident that the relationship
is true for the population. Generally, this relationship is a weak positive correlation and nonsignificance.
According to Prudon (2015), there are two advantages to this method. The first advantage is providing
accurate information in the appropriate situations. Another advantage is ‘well-established theory and
methods’.
However, this method also has some disadvantages, and there are three disadvantages in the confirmatory
analysis. The first disadvantage is the impression of accuracy that is deceptive in less- than-ideal conditions
(Prudon, 2015). Next, the second disadvantage is analyzing driven by preconceived ideas, and the final
negative effect of this method is difficult to discover unexpected effects (Prudon, 2015).
Exploratory is a method used in exploring large data sets to make conclusions or predictions (Miksza &
Elpus, 2018). Using exploratory analysis supports researchers to search for information or trends, which
assist them in making conclusions; moreover, the processes of this method are relevant to many missions,
which include errors, figures lacking or identifying crucial variables in the data set (Miksza & Elpus, 2018).
Additionally, exploratory analysis is also relevant to presenting a final assessment.
0Application
Scatter plot of the model estimated IQ scores with respect to observed values using multimodal
neuroimaging features (Jiang, et al, 2020)
This scatter plot includes two aspects: region and network, which can support researchers to predict the
intelligence quotient (IQ) scores between male and female. The author of the study employed connectome-
based predictive modeling (CPM) to predict the intelligence quotient (IQ) scores for 166 males and 160
females separately, using resting-state functional connectivity, grey matter cortical thickness, or both. The
relationship between of region to the communicated network is positive correlation. The writer go further
to make a hypothesis/theory for the population: IQ can affected by region and communicated network.
(leading to a hypothesis)
In this type, there are several advantages. The first advantage is providing valued knowledge that is relevant
to data (Voxco, 2021). Next, the second advantage is supporting researchers to choose function in principal
component analysis easier (Voxco, 2021). Finally, Visualization is an efficient technique for identifying
outliers (Voxco, 2021).
However, this method also has some disadvantages, and there are two disadvantages in the exploratory
statistics area. The first disadvantage is processing because if the manipulation operation in exploratory data
analysis (EDA) has errors, this action can lead to some issues (Voxco, 2021). Next, another problem is that
EDA is not effective when researchers want to process high- dimensional data (Voxco, 2021).
IV. Dataset
4.1. Research question
Research question: Survey on the number of employees working in enterprises in the Red River Delta
(Vietnam) and how much profit their labor productivity has generated for businesses in 2015.
Population: All business located in the Red River Delta region of Vietnam.
Enterprise Surveys are stratified by sector of geographical location, specifically the Red River Delta region.
Geographical stratification is described as the distribution of each country's non-agricultural economic
activity, which in most circumstances means covering the country's major metropolitan areas.
For numerous reasons, stratified random sampling was favored above basic random sample:
• To acquire unbiased estimates for various demographic subgroups with a known degree of precision.
• To ensure that the final total sample include establishments from all industries and is not
concentrated in one or two industries/sizes/regions.
4.2. Apply statistical tools
4.2.1. Descriptive statistics for number of labors
20
10 5 3 3 4 5 3 3 4
1 2 1 1 1
0
In general, there are not too many businesses in diverse industries. Non-metallic mineral products &
Fabricated metal products accounted for the highest % in the above table (with about 40%). Next is
Garments with 31% and Food with 28%.
In contrast to the above industries, the remaining industries such as Textiles or Plastic & rubber account for
less than 5%, there are some industries with only 1% - an extremely low percentage. Non-metallic mineral
products & Fabricated metal products are below 5% to 8 - 40 times higher than other industries.
6.000
5.000
4.000
3.000
2.000
1.000
0
0 50 100 150 200
Hours operating/ week
The scatter plot below demonstrates the relationship between sales and hours operating/week. The value of
the correlation coefficient is 0,32325, therefore, this relationship considered as weak and the two variables
are not dependent on each other. For example, if the worked hour of employees increased, the sales of that
firm may not rise.
V. Conclusion
This research examined the function and nature of business and economic data/information. In addition,
demonstrations of data analysis using many analytical approaches and a description of specific applications
for a dataset have been provided.
REFERENCES
[1] Cavallaro, M. and Fidell, L. (1994) “Basic Descriptive Statistics: Commonly Encountered Terms and
Examples,” American Journal of EEG Technology, 34(3), pp. 138–152. Available at:
https://doi.org/10.1080/00029238.1994.11080483.
[2] Hill, J. (2021) “Data vs Information: What’s the Difference?,” Bloomfire, 15 June. Available at:
https://bloomfire.com/blog/data-vs-information/ (Accessed: March 20, 2023).
[3] Blokhin, A. (2022) “When to Apply the Geometric Mean: Key Examples,” Investopedia, 3 June.
Available at: https://www.investopedia.com/ask/answers/060115/what-are-some-examples-applications-
geometric-mean.asp (Accessed: March 20, 2023).
[4] Indeed Editorial Team (2023) “6 Types of Information (With Examples),” Indeed Career Guide, 28
February. Available at: https://www.indeed.com/career-advice/career-development/types-of-information
(Accessed: March 20, 2023).
[5] Hilpinen, R. (1970) “Knowing that one knows and the classical definition of knowledge,” Synthese,
21(2), pp. 109–132. Available at: https://doi.org/10.1007/bf00413541.
[6] Fisher, M.J. and Marshall, A.P. (2009) “Understanding descriptive statistics,” Australian Critical
Care, 22(2), pp. 93–97. Available at: https://doi.org/10.1016/j.aucc.2008.11.003.
[7] Baha, H (2016) An Introduction of Descriptive Analysis, its advantages and disadvantages [Online].
Available at:
https://www.academia.edu/25307454/Title_An_Introduction_on_Descriptive_Analysis_Its_a%20dvantag
es_and_disadvantages (Accessed: 22 May 2023).
[8] Chegg (2020) Assume that a normal distribution of data has a mean of 14 and a standard deviation of
4. Use the 68−95−99.7 rule to find the percentage of values that lie above 26 2626., Chegg.com.
Available at: https://www.chegg.com/homework-help/questions-and-answers/assume-normal-distribution-
data-mean-14-standard-deviation-4-use-68-95-997-rule-find-perce-q49082664 (Accessed: April 2, 2023).
[9] Miksza, P. and Elpus, K. (2018) Exploratory and Confirmatory Factor Analysis. Oxford University
Press. Available at: http://dx.doi.org/10.1093/oso/9780199391905.003.0013 (Accessed: March 22, 2023).
[10] Prudon, P. (2015) “Confirmatory Factor Analysis as a Tool in Research Using Questionnaires: A
Critique,” Comprehensive Psychology, 4(10), p. 03.CP.4.10. Available at:
https://doi.org/10.2466/03.cp.4.10.
[11] Voxco (2021) Exploratory Research: Pros And Cons, Voxco. Available at:
https://www.voxco.com/blog/exploratory-research-pros-and-cons/ (Accessed: March 25, 2023).
[12] Jiang, R. (2020) Scatter plot of the model-estimated IQ scores with respect to observed...,
ResearchGate. Available at: https://www.researchgate.net/figure/Scatter-plot-of-the-model-estimated-IQ-
scores-with-respect-to-observed-values-using_fig3_334262517 (Accessed: March 25, 2023).