Assignment 1 (Chapter 1-4) Fin 534
Assignment 1 (Chapter 1-4) Fin 534
KELANTAN
PREPARED BY:
NAME STUDENTS ID
MUHAMMAD FAIZ FAKHRUDDIN BIN 2019845414
SHIHABUDDIN
MUHAMMAD ASYWAM BIN ASRI 2019861586
MUHAMMAD HAZIM BIN MOHD ASRI 2019620422
MUHAMMAD DZAHIRULHAQ BIN ZUHAIDI 2019800186
ZURAINA BINTI MAT DAUD 2020678768
CLASS:
DBMF8A
PREPARED FOR:
WAN YUSROL RIZAL BIN W. YUSOF
SUBMISSION DATE:
13th JANUARY 2023
TABLE OF CONTENTS
CONTENTS PAGES
CHAPTER 1 INTRODUCTION OF BUSINESS ANALYTICS
1.0 Introduction 1
2.0 Introduction of Business Analytics 2
3.0 Evolution of Business Analytics 2–3
4.0 Scope of Business Analytics 3-4
4.1 Business Analytics Tools 4–5
5.0 Data for Business Analytics 5–6
5.1 Classification of Data 6–7
5.2 Data Reliability and Validity 7-8
6.0 Models in Business Analytics 8–9
6.1 Uncertainty and Risk 10
6.2 Predictive Models 10
7.0 Problem solving with Analytics 10
i
13.3 Frequency Distribution 16
13.4 Relative Frequency Distribution 16
13.5 Excel Histogram Tool 17
13.6 Cumulative Relative Frequency Distributions 17
13.7 Percentiles 17
13.8 Quartiles 17 – 18
13.9 Cross-Tabulation 18
13.10 Using Pivot Tables 18
13.11 Pivot Charts 18
13.12 Slicers and Pivot Tablet Dashboard 18
14.0 Conclusion of Data Visualization and Exploration 18
ii
23.0 Outliers 27 – 28
24.0 Statistical Thinking in Business Decisions 28
24.1 Variability in Samples 28
25.0 Conclusion 28 - 29
iii
CHAPTER 1 BUSINESS ANALYTICS
1.0 Introduction
As Big Data continues to grow, so does the demand for business analytics professionals.
A deluge of abstract information exists, spurred over two decades ago by the internet and
accelerating at a significantly faster pace over the past few years. The long used database
management and analysis techniques no longer suffice, particularly where unstructured
data like product reviews and social media posts is concerned. Data analysis careers
business analytics, in addition to business intelligence and data science roles address the
new methods for organizing, gaining insights from, and making predictions with this
steadily increasing amount of information and frequently incorporate computer science and
other technical knowledge.
Whether we have been spent the past few decades in an analytics role or we have
exploring the myriad of computer science applications, we might be wondering, “What is
business analytics?” Here’s a primer on this burgeoning field and the skills needed to start
a career.
1
2.0 Introduction of Business Analytics
Business analytics is the process of transforming data into insights to improve business
decisions. Data management, data visualization, predictive modelling, data mining,
forecasting simulation, and optimization are some of the tools used to create insights from
data. Yet, while business analytics learns heavily on statistical, quantitative, and operational
analysis, developing data visualizations to present our findings and shape business
decisions is the end result. For this reason, balancing our technical background with strong
communication skills is imperative to do well in this field.
Furthermore, business analytics also one of the use of data, information technology,
statistical analysis, quantitative methods, and mathematical or computer based models to
help managers gain improved insights about their business operations and make better, fact
based decisions. For examples of decisions making using business analytics are pricing,
customer segmentation, merchandising, location, supply chain, finance, marketing and HR.
Business analytics also supported by tool such as Microsoft Excel ad Excel add-ins,
commercial statistical packages i.e. SAS or Minitab and more-complex software.
Business analytics has been existence since very long time and has evolved with availability
of newer and better technologies. It has its roots in operations research, which was
extensively used during World War II. Operations Research, was an analytical way to look
at data to conduct military operations. Over a period of time, this technique started getting
utilized for business. Here operation’s research evolved into management science. Again,
basis for management science remained same as operation research in data, decision
making models, etc. As the economies started developing and companies became more and
more competitive, management science evolved into business intelligence, decision
support systems and into PC software.
The statistics is the basic tools of description, exploration, estimation and inference, as well
as more advanced techniques like regression forecasting and data mining. Data mining for
understanding characteristics and patterns among variables in large databases using a
variety of statistical and analytical tools. Business intelligence or information systems is
the collection, management analysis and resource in fact, in every discipline of business
2
for reporting of data but for the modelling and optimization techniques for translating real
problems into mathematics, spreadsheets or other computer languages and using them to
find the best or optimal solutions and decisions. If these visualizations came together, it
will be used to assess the sensitivity of optimization models to changes in data inputs.
Simulation and risk is using spreadsheet models and statistical analytics to examine the
impacts of uncertainty in the estimates and their potential interaction with one another on
the output variables of interest.
For example, which is Analytics in Practice: Harrah’s Entertainment. One of the most citied
examples of the use of analytics in business is Harrah’s Entertainment. Harrah’s owns
numerous hotels and casino and uses analytics to support revenue management activities,
which involve selling the right resources to the right customer at the right price to maximize
revenue and profit. The gaming industry vies hotel rooms as incentives or rewards to
support casino gaming activities and revenues not as revenue-maximizing assets.
Therefore, Harrah’s objective is to set room rates and accept reservations to maximize the
expected gaming profits from customers. They begin with collecting and tracking of
customer’s activities (playing slot machines and casino games) using Harrah’s “Total
Rewards” card program, a customer loyalty program that provides rewards such as meals,
discounted rooms, and other perks to customers based on the amount of money and time
they spend at Harrah’s. The data collected are used to segment customers into more than
2-0 groups based on their expected gaming activities. For each customer segment, analytics
forecasts demand for hotel rooms by arrival date and length of stay. Then Harrah’s uses a
prescriptive model to set prices and allocate rooms to these customer segments. For
example, the system might offer complimentary rooms to customers who are expected to
generate a gaming profit of at least $400 but charge $325 for a room if the profit is expected
to be only $100. Marketing can use the information to send promotional offers to targeted
customer segments if it identifies low-occupancy rates for specific dates.
Business analytics begin with the collection, organization, and manipulation of data and is
supported by 3 major components:
i. Descriptive Analytics
3
The use of data to understand past and current business performance and make
informed decisions. These techniques categorize, characterize, consolidate, and
classify data to convert it into useful information for the purpose of
understanding and analysing business performance. Descriptive analytics
summarizes data into meaningful charts and reports. For example, about
budgets, sales, revenue or cost.
4
Although the tools used in descriptive, predictive and prescriptive analytics are
different, many applications involved all three.
EXAMPLE:
The data are numerical facts and figures that are collected through some type of
measurement process. So, the information comes from analysing data – that is, extracting
meaning from data to support evaluation and decision making. For examples:
i) Annual reports summarize data about companies’ profitability and market share
both in numerical form and in charts and graphs to communicate with
shareholders.
ii) Marketing researchers collect and analyse extensive customer data. These data
often consist of demographics, preferences and opinions, transaction and payment
history, shopping behaviour, and a lot more. Such data may be collected by
surveys, personal interviews, focus group, or from shopper loyalty cards.
iii) A data set is simply a collection of data e.g. Student record.
iv) A database is a collection of related files containing records on people, places, or
things e.g. SIMS (Student Information Management Systems).
v) Measurement is the act of obtaining data associated with a metric. Measures are
numerical values associated with a metric. Metrics can be either discrete or
continuous.
a. A discrete metric is one that is derived from counting something e.g. a
delivery is either on time or not; an order is complete or incomplete.
5
b. Continuous metrics are based on a continuous scale of measurement. Any
metrics involving dollars, length, time, volume, or weight, for example are
continuous.
6
Interval Data can be used to rank students, but only differences between scores
provide information on how much better one student performed over
another. In contrast to ordinal data, interval data allow meaningful
comparison of ranges, averages, and other statistics.
Ratio Data Are continuous and have a natural zero. Most business and economic
data, such as dollars and time, fall into this category e.g. the measure
dollars has an absolute zero.
7
it is used to assess customer dissatisfaction, as many calls may be
simple queries. A survey question that asks a customer to rate the
quality of the food in a restaurant may be neither reliable (because
different customers may have conflicting perceptions) nor valid (if
the intent is to measure customer satisfaction, as satisfaction
generally includes other elements of service besides food).
8
9
6.1 Uncertainty and Risk
The future is always uncertain. Predictive models incorporate uncertainty and help
decision makers analyse the risks associated with their decisions. Uncertainty is
imperfect knowledge of what will happen. Risk is associated with the consequences
and likelihood of what might happen. For example, the change in the stock price
of Apple on the next day of trading is uncertain. However, if you own Apple stock,
then you face the risk of losing money if the stock price falls. If you don’t own any
stock, the price is still uncertain although you would not have any risk.
A prescriptive decision models helps decision makers to identify the best solution
to a decision problem. Optimization is the process of finding a set of value for
decision variables that minimize or maximize some quantity of interest of profit,
revenue, cost, time and so on that we called the objective function. Any set of
decision variables that optimizes the objective function is called an optimal
solution. In a highly competitive world where one percentage point can mean a
difference of hundreds of thousands of dollars or more, knowing the best solution
can mean the difference between success and failure. Prescriptive decision models
can be either deterministic or stochastic. A deterministic model is one in which all
model input information is either known or assumed to be known with certainty. A
stochastic model is one in which some of the model input information is uncertain.
10
CHAPTER 2 ANALYTICS ON SPREADSHEETS
Assuming that you have two models for predicting demand as a function of
prices:
11
9.0 Basic Excel Functions
12
9.1 Other ‘IF’ – type Functions
ii. IF Function
13
10.0 Excel Lookup Functions
14
CHAPTER 3 DATA VISUALIZATION AND EXPLORATION
Converting data into information to understand past and current performance is the core
of descriptive analytic and is vital to making good business decisions. Technique for
doing this range from plotting data on charts, extracting data from database, and
manipulating and summarizing data. Data visualization is the process of displaying data
in a meaningful fashion to provide insights that will support better decision. Visualizing
data provides a way of communication data at all levels of a business and can reveal
surprising patterns and relationships.
15
This doesn’t mean that users just type in random request. for a database to understand
demand, it must receive a query based on the predefined code. That code is a query
language
16
13.5 Excel Histogram Tool
Frequency distribution and histogram can be created using the analysis tool Pak
in Excel. To do this m click the Data analysis tools button in the analysis group
under the data tab in the excel menu bar and select Histogram from the list. In
the dialog box, specify the input range corresponding to the data. If you include
the column header, then also check the label box so Excel knows that the range
contains a label. The Bin Range defines the groups used for the frequency
distribution. If you do not specify a Bin Range, Excel will automatically
determine bin values for the frequency distribution and histogram, which often
result in a rather poor choice. If you have discrete, values, set up a column of
these values is your spreadsheet for the bin range and specify this range in the
bin range field. Tick the chart output box to display a histogram in addition to
the frequency distribution. You may also sort the values as a Paratoo chart and
display the cumulative frequencies by checking the additional boxes.
The cumulative relative frequency represents the proportion of the total number
of observations that fall at or below the upper limit pf each group. A tabular
summary of cumulative relative frequencies is called a cumulative frequency
distribution.
13.7 Percentiles
Percentiles specify the percent of other test takers who score at or below the
score of particular individual. The most common way compute the kth
percentile is to order the data value from smallest to largest and calculate the
rank of the kth percentile using the formula :
nk + 0.5
100
13.8 Quartiles
17
4) The 100th percentile is called the fourth quartile, Q4
One fourth of the data fall below the first quartile, one half are below the second,
and three – fourths are below the third quartile. We may compute quartile using
the excel function QUARTILE. INC (array, quart), where array specifies the
range of the data and quart is a whole number between 1 and 4 , designation the
desired quartile.
Pivot tables allows you to create summaries and chart of key information in the
data. Pivot tablet can be used to quickly create cross tabulations and to drill
down into a large set of data in numerous ways.
A pivot chart is the visual representation of a pivot tablet in Excel. Pivot chart
and pivot tables are connected with each other.
Slicers is a tool for drilling down to “slice” a Pivot Table and display a subset
of data. To create a slicer for any of the column in the database, click on the
pivot table and choose insert slicer from the analyse tab in the pivot table tools
ribbon.
Good data visualization should communicate a data set clearly and effectively by using
graphic. The best visualization makes it easy to comprehend data a t a glace.
18
CHAPTER 4: DESCRIPTIVE STATISTICAL MEASURES
15.0 Introduction
Frequency distributions, histogram and cross- tabulations are tabular and visual tools
of descriptive statistics. In this chapter, we introduce numerical measures that provide
an effective and efficient way of obtaining meaningful information from data.
Population is the entire set of items from which you draw data for a statistical study. It
can be a group of individuals, a set of items, etc. It makes up the data pool for a study.
Generally, population refers to the people who live in a particular area at a specific time.
But in statistics, population refers to data on your study of interest. It can be a group of
individuals, objects, events, organizations, etc. You use populations to draw
conclusions.
Figure 1: Population
19
Depending on the problem statement, data from each of these students is collected.
An example is the students who speak Hindi among the students of a school.
For the above situation, it is easy to collect data. The population is small and willing to
provide data and can be contacted. The data collected will be complete and reliable.
If you had to collect the same data from a larger population, say the entire country of
India, it would be impossible to draw reliable conclusions because of geographical and
accessibility constraints, not to mention time and resource constraints. A lot of data
would be missing or might be unreliable. Furthermore, due to accessibility issues,
marginalized tribes or villages might not provide data at all, making the data biased
towards certain regions or groups.
What is a Sample?
The sample is an unbiased subset of the population that best represents the whole data.
To overcome the restraints of a population, you can sometimes collect data from a
subset of your population and then consider it as the general norm. You collect the
subset information from the groups who have taken part in the study, making the data
reliable. The results obtained for different groups who took part in the study can be
extrapolated to generalize for the population.
Figure 2: Sample
20
The process of collecting data from a small subsection of the population and then using
it to generalize over the entire set is called Sampling.
Measures of location summarize a list of numbers by a "typical" value. The three most
common measures of location are the mean, the median, and the mode. The mean is the
sum of the values, divided by the number of values. It has the smallest possible sum of
squared differences from members of the list
21
18.0 Measure of Dispersion
Dispersion is the state of getting dispersed or spread. Statistical dispersion means the
extent to which numerical data is likely to vary about an average value. In other words,
dispersion helps to understand the distribution of the data.
In statistics, the measures of dispersion help to interpret the variability of data i.e. to
know how much homogenous or heterogeneous the data is. In simple terms, it shows
how squeezed or scattered the variable is.
Variance: Deduct the mean from each data in the set, square each
of them and add each square and finally divide them by the total
no of values in the data set to get the variance. Variance (σ2) =
∑(X−μ)2/N
22
Mean and Mean Deviation: The average of numbers is known as the
mean and the arithmetic mean of the absolute deviations of the
observations from a measure of central tendency is known as the mean
deviation (also called mean absolute deviation).
Coefficient of Variation
Provides a relative measure of the dispersion in data relative to the mean. CV = standard
deviation/mean. It can be calculating as a percentage. Provides a relative measure of
risk to return and the smaller the coefficient of variation the smaller the relative risk is
for the return provided.
Look at the two graphs below. They both have μ = 0.6923 and σ = 0.1685, but their
shapes are different.
23
The beta distribution is one of the many skewed distributions that are used in
mathematical modelling.
The first one is moderately skewed left: the left tail is longer and most of the distribution
is at the right. By contrast, the second distribution is moderately skewed right: its right
tail is longer and most of the distribution is at the left.
Kurtosis
The other common measure of shape is called the kurtosis. As skewness involves the
third moment of the distribution, kurtosis involves the fourth moment. The outliers in
a sample, therefore, have even more effect on the kurtosis than they do on the skewness
and in a symmetric distribution both tails increase the kurtosis, unlike skewness where
they offset each other.
24
You may remember that the mean and standard deviation have the same units as the
original data, and the variance has the square of those units. However, the kurtosis, like
skewness, has no units: it’s a pure number, like a z-score.
Traditionally, kurtosis has been explained in terms of the central peak. You’ll see
statements like this one: Higher values indicate a higher, sharper peak; lower values
indicate a lower, less distinct peak. Balanda and MacGillivray (1988) also mention the
tails: increasing kurtosis is associated with the “movement of probability mass from the
shoulders of a distribution into its centre and tails.”
However, Peter Westfall (2014) has been on a bit of a crusade to change this perception,
and I think he makes a good case. We might say, following Wikipedia’s article on
kurtosis (accessed 15 May 2016), that “higher kurtosis means more of the variance is
the result of infrequent extreme deviations, as opposed to frequent modestly sized
deviations.” In other words, it’s the tails that mostly account for kurtosis, not the central
peak.
A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any
distribution with kurtosis ≈3 (excess ≈0) is called mesocratic.
A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to
a normal distribution, its tails are shorter and thinner, and often its central peak is lower
and broader.
A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to
a normal distribution, its tails are longer and fatter, and often its central peak is higher
and sharper.
Note that word “often” in describing changes in the central peak due to changes in the
tails. Westfall 2014 gives several illustrations of counterexamples.
25
21.0 Excel Descriptive Statistics Tool
• In these situations, we cannot compute the mean or variance using the standard
formulas.
frequency
• Statistics such as means and variances are not appropriate for categorical data.
• If you look at the Value Field Settings dialog, you can see that you can calculate
the average, standard deviation, and variance of a value field.
• Two variables have a strong statistical relationship with one another if they
appear to move together. e.g. ice cream sales and hot weather.
26
• Sometimes, however, statistical relationships exist even though a change in one
variable is not caused by a change in the other.
23.0 Outliers
• mean and range are sensitive to outliers—unusually large or small values in the
data.
27
• The first thing to do from a practical perspective is to check the data for possible
errors, such as a misplaced decimal point or an incorrect transcription to a computer
file.
• We might use the empirical rule and z-scores to identify an outlier as one that
is more than three standard deviations from the mean.
25.0 Conclusion
28
data distributions. For human geographers, it is often necessary to take into account the
locational references of the data we work with. The spatial descriptive statistics allow
analysts to assess the central tendency and variation of data in spatial context. The two
types of descriptive statistics are complementary. Combining both statistics, analysts
are able to study the geographic phenomena they work with.
While descriptive statistics are simple concepts in statistical analysis, they are important
and useful in today's era of big data. With increasing large volumes of data being
produced constantly and distributed via Internet, the effectiveness and usefulness of
descriptive statistics should not be overlooked.
29