0% found this document useful (0 votes)
24 views33 pages

Assignment 1 (Chapter 1-4) Fin 534

This document outlines an assignment for the FIN534 course on Business Analytics and Financial Modelling, covering chapters 1 to 4. It discusses the importance of business analytics, its evolution, tools, data classification, and models used in analytics. The assignment is prepared by students for submission on January 13, 2023.

Uploaded by

Muhammad Faiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views33 pages

Assignment 1 (Chapter 1-4) Fin 534

This document outlines an assignment for the FIN534 course on Business Analytics and Financial Modelling, covering chapters 1 to 4. It discusses the importance of business analytics, its evolution, tools, data classification, and models used in analytics. The assignment is prepared by students for submission on January 13, 2023.

Uploaded by

Muhammad Faiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIVERSITY TECHNOLOGY MARA

KELANTAN

FIN534 – BUSINESS ANALYTICS AND FINANCIAL MODELLING


ASSIGNMENT 1 (CHAPTER 1 TO CHAPTER 4)

PREPARED BY:
NAME STUDENTS ID
MUHAMMAD FAIZ FAKHRUDDIN BIN 2019845414
SHIHABUDDIN
MUHAMMAD ASYWAM BIN ASRI 2019861586
MUHAMMAD HAZIM BIN MOHD ASRI 2019620422
MUHAMMAD DZAHIRULHAQ BIN ZUHAIDI 2019800186
ZURAINA BINTI MAT DAUD 2020678768

CLASS:
DBMF8A

PREPARED FOR:
WAN YUSROL RIZAL BIN W. YUSOF

SUBMISSION DATE:
13th JANUARY 2023
TABLE OF CONTENTS
CONTENTS PAGES
CHAPTER 1 INTRODUCTION OF BUSINESS ANALYTICS
1.0 Introduction 1
2.0 Introduction of Business Analytics 2
3.0 Evolution of Business Analytics 2–3
4.0 Scope of Business Analytics 3-4
4.1 Business Analytics Tools 4–5
5.0 Data for Business Analytics 5–6
5.1 Classification of Data 6–7
5.2 Data Reliability and Validity 7-8
6.0 Models in Business Analytics 8–9
6.1 Uncertainty and Risk 10
6.2 Predictive Models 10
7.0 Problem solving with Analytics 10

CHAPTER 2 ANALYTICS ON SPREADSHEETS


8.0 Basic Concept and Procedures 11
8.1 Example: Demand Prediction Model 11 – 12
9.0 Basic Excel Functions 12
9.1 Other ‘IF’ – Type Functions 13
9.2 Function for Specific Applications 13
10.0 Excel Lookup Function 14
11.0 Spreadsheet Add-Ins for Business Analytics 14

CHAPTER 3 DATA VISUALIZATION AND EXPLORATION


12.0 Introduction of Data Visualization 15
13.0 Creating Charts using Tableau 15
13.1 Data Queries 15 - 16
13.2 Statistic Method for Summarizing Data 16

i
13.3 Frequency Distribution 16
13.4 Relative Frequency Distribution 16
13.5 Excel Histogram Tool 17
13.6 Cumulative Relative Frequency Distributions 17
13.7 Percentiles 17
13.8 Quartiles 17 – 18
13.9 Cross-Tabulation 18
13.10 Using Pivot Tables 18
13.11 Pivot Charts 18
13.12 Slicers and Pivot Tablet Dashboard 18
14.0 Conclusion of Data Visualization and Exploration 18

CHAPTER 4 DESCRIPTIV STATISTICAL MEASURES


15.0 Introduction 19
16.0 Population and Samples 19 -21
17.0 Measures of Location 21
18.0 Measure of Dispersion 22
18.1 Types of Measures of Dispersion 22
18.1.1 Absolute Measure of Dispersion 22
18.1.1.1 Types of absolute measures of dispersion 22 - 23
19.0 Standardized Value 23
20.0 Measure of Shape 23 – 25
21.0 Excel Descriptive Statistics Tool 26
21.1 Descriptive Statistics for Grouped Data 26
21.2 Descriptive Statistics for categorical data 26
21.3 Descriptive Statistics using PivotTables 26
22.0 Measures of Association 26 – 27
22.1 Measures of Association: Covariance 27
22.2 Measures of Association: Correlation 27
22.3 Excel Correlation Tool 27

ii
23.0 Outliers 27 – 28
24.0 Statistical Thinking in Business Decisions 28
24.1 Variability in Samples 28
25.0 Conclusion 28 - 29

iii
CHAPTER 1 BUSINESS ANALYTICS

1.0 Introduction

As Big Data continues to grow, so does the demand for business analytics professionals.
A deluge of abstract information exists, spurred over two decades ago by the internet and
accelerating at a significantly faster pace over the past few years. The long used database
management and analysis techniques no longer suffice, particularly where unstructured
data like product reviews and social media posts is concerned. Data analysis careers
business analytics, in addition to business intelligence and data science roles address the
new methods for organizing, gaining insights from, and making predictions with this
steadily increasing amount of information and frequently incorporate computer science and
other technical knowledge.

Whether we have been spent the past few decades in an analytics role or we have
exploring the myriad of computer science applications, we might be wondering, “What is
business analytics?” Here’s a primer on this burgeoning field and the skills needed to start
a career.

1
2.0 Introduction of Business Analytics

Business analytics is the process of transforming data into insights to improve business
decisions. Data management, data visualization, predictive modelling, data mining,
forecasting simulation, and optimization are some of the tools used to create insights from
data. Yet, while business analytics learns heavily on statistical, quantitative, and operational
analysis, developing data visualizations to present our findings and shape business
decisions is the end result. For this reason, balancing our technical background with strong
communication skills is imperative to do well in this field.

Furthermore, business analytics also one of the use of data, information technology,
statistical analysis, quantitative methods, and mathematical or computer based models to
help managers gain improved insights about their business operations and make better, fact
based decisions. For examples of decisions making using business analytics are pricing,
customer segmentation, merchandising, location, supply chain, finance, marketing and HR.
Business analytics also supported by tool such as Microsoft Excel ad Excel add-ins,
commercial statistical packages i.e. SAS or Minitab and more-complex software.

3.0 Evolution of Business Analytics

Business analytics has been existence since very long time and has evolved with availability
of newer and better technologies. It has its roots in operations research, which was
extensively used during World War II. Operations Research, was an analytical way to look
at data to conduct military operations. Over a period of time, this technique started getting
utilized for business. Here operation’s research evolved into management science. Again,
basis for management science remained same as operation research in data, decision
making models, etc. As the economies started developing and companies became more and
more competitive, management science evolved into business intelligence, decision
support systems and into PC software.

The statistics is the basic tools of description, exploration, estimation and inference, as well
as more advanced techniques like regression forecasting and data mining. Data mining for
understanding characteristics and patterns among variables in large databases using a
variety of statistical and analytical tools. Business intelligence or information systems is
the collection, management analysis and resource in fact, in every discipline of business

2
for reporting of data but for the modelling and optimization techniques for translating real
problems into mathematics, spreadsheets or other computer languages and using them to
find the best or optimal solutions and decisions. If these visualizations came together, it
will be used to assess the sensitivity of optimization models to changes in data inputs.
Simulation and risk is using spreadsheet models and statistical analytics to examine the
impacts of uncertainty in the estimates and their potential interaction with one another on
the output variables of interest.

For example, which is Analytics in Practice: Harrah’s Entertainment. One of the most citied
examples of the use of analytics in business is Harrah’s Entertainment. Harrah’s owns
numerous hotels and casino and uses analytics to support revenue management activities,
which involve selling the right resources to the right customer at the right price to maximize
revenue and profit. The gaming industry vies hotel rooms as incentives or rewards to
support casino gaming activities and revenues not as revenue-maximizing assets.
Therefore, Harrah’s objective is to set room rates and accept reservations to maximize the
expected gaming profits from customers. They begin with collecting and tracking of
customer’s activities (playing slot machines and casino games) using Harrah’s “Total
Rewards” card program, a customer loyalty program that provides rewards such as meals,
discounted rooms, and other perks to customers based on the amount of money and time
they spend at Harrah’s. The data collected are used to segment customers into more than
2-0 groups based on their expected gaming activities. For each customer segment, analytics
forecasts demand for hotel rooms by arrival date and length of stay. Then Harrah’s uses a
prescriptive model to set prices and allocate rooms to these customer segments. For
example, the system might offer complimentary rooms to customers who are expected to
generate a gaming profit of at least $400 but charge $325 for a room if the profit is expected
to be only $100. Marketing can use the information to send promotional offers to targeted
customer segments if it identifies low-occupancy rates for specific dates.

4.0 Scope of Business Analytics

Business analytics begin with the collection, organization, and manipulation of data and is
supported by 3 major components:

i. Descriptive Analytics

3
The use of data to understand past and current business performance and make
informed decisions. These techniques categorize, characterize, consolidate, and
classify data to convert it into useful information for the purpose of
understanding and analysing business performance. Descriptive analytics
summarizes data into meaningful charts and reports. For example, about
budgets, sales, revenue or cost.

ii. Predictive Analytics


Seeks to predict the future by examining historical data, detecting patterns or
relationships in these data, and then extrapolating these relationships forward in
time. Predictive analytics can predict risk and find relationships in data not
readily apparent with traditional analyses. Using advanced techniques,
predictive analytics can help to detect hidden patterns in large quantities of data
to segment and group data into coherent sets to predict behaviour and detect
trends.

iii. Prescriptive Analytics


Uses optimization to identify the best alternatives to maximize or minimize
some objective. The mathematical and statistical techniques of predictive
analytics can also be combined with optimization to make decision that take
into account the uncertainty in the data.

4.1 Business Analytics Tools


i) Database queries and analysis
ii) “Dashboard” to report key performance measures
iii) Data visualization
iv) Statistical Methods
v) Spreadsheets and predictive models
vi) Scenario and “what-if” analyses
vii) Simulation
viii) Forecasting
ix) Data and text mining
x) Optimization
xi) Social media, Web, and text analytics

4
Although the tools used in descriptive, predictive and prescriptive analytics are
different, many applications involved all three.

EXAMPLE:

5.0 Data for Business Analytics

The data are numerical facts and figures that are collected through some type of
measurement process. So, the information comes from analysing data – that is, extracting
meaning from data to support evaluation and decision making. For examples:

i) Annual reports summarize data about companies’ profitability and market share
both in numerical form and in charts and graphs to communicate with
shareholders.
ii) Marketing researchers collect and analyse extensive customer data. These data
often consist of demographics, preferences and opinions, transaction and payment
history, shopping behaviour, and a lot more. Such data may be collected by
surveys, personal interviews, focus group, or from shopper loyalty cards.
iii) A data set is simply a collection of data e.g. Student record.
iv) A database is a collection of related files containing records on people, places, or
things e.g. SIMS (Student Information Management Systems).
v) Measurement is the act of obtaining data associated with a metric. Measures are
numerical values associated with a metric. Metrics can be either discrete or
continuous.
a. A discrete metric is one that is derived from counting something e.g. a
delivery is either on time or not; an order is complete or incomplete.

5
b. Continuous metrics are based on a continuous scale of measurement. Any
metrics involving dollars, length, time, volume, or weight, for example are
continuous.

5.1 Classification of Data


Are sorted into categories according to specified characteristics. E.g.
Categorical a firm’s customers might be classified by their geographical region
(Nominal) Data (North America, South America, Europe, and Pacific); employees
might be classified as managers, supervisors, and associates.
Can be ordered or ranked according to some relationship to one
another. Ordinal data are more meaningful than categorical data
Ordinal Data because data can be compared to one another. A common example in
business is data from survey scales e.g. rating a service as poor,
average, good, very good, or excellent. A higher ranking signifies a
better service but does not specify any numerical measure of strength.
Are ordinal but have constant differences between observations and
have arbitrary zero points. Common examples are time and
temperature. Another example is GPA or CGPA scores. The scores

6
Interval Data can be used to rank students, but only differences between scores
provide information on how much better one student performed over
another. In contrast to ordinal data, interval data allow meaningful
comparison of ranges, averages, and other statistics.
Ratio Data Are continuous and have a natural zero. Most business and economic
data, such as dollars and time, fall into this category e.g. the measure
dollars has an absolute zero.

5.2 Data Reliability and Validity


i) Poor data can result in poor decisions.
ii) Data used in business decision need to be reliable and valid.
iii) Reliability means that data are accurate and consistent.
iv) Validity means that data correctly measure what they are supposed to
measure. For example:
a. The number of calls to a customer service desk might be counted
correctly each day (and thus is a reliable measure), but not valid if

7
it is used to assess customer dissatisfaction, as many calls may be
simple queries. A survey question that asks a customer to rate the
quality of the food in a restaurant may be neither reliable (because
different customers may have conflicting perceptions) nor valid (if
the intent is to measure customer satisfaction, as satisfaction
generally includes other elements of service besides food).

6.0 Models in Business Analytics

Many decision problems can be formalized using a mode. A model is an abstraction or


representation of a real system, idea, or object. Models capture the most important features
of a problem and present them in a form that is easy to interpret. Furthermore, a model can
be as simple as a written or verbal description of some phenomenon, a visual representation
such as a graph or a flowchart, or a mathematical or spreadsheet representation. A model
can be descriptive, predictive or prescriptive, and therefore are used in a wide variety of
business analytics applications. A model also usually developed from theory or observation
and establish relationships between actions that decision makers might take and result that
they might expect, thereby allowing the decision makers to predict what might happen
based on the model.

8
9
6.1 Uncertainty and Risk

The future is always uncertain. Predictive models incorporate uncertainty and help
decision makers analyse the risks associated with their decisions. Uncertainty is
imperfect knowledge of what will happen. Risk is associated with the consequences
and likelihood of what might happen. For example, the change in the stock price
of Apple on the next day of trading is uncertain. However, if you own Apple stock,
then you face the risk of losing money if the stock price falls. If you don’t own any
stock, the price is still uncertain although you would not have any risk.

6.2 Predictive Models

A prescriptive decision models helps decision makers to identify the best solution
to a decision problem. Optimization is the process of finding a set of value for
decision variables that minimize or maximize some quantity of interest of profit,
revenue, cost, time and so on that we called the objective function. Any set of
decision variables that optimizes the objective function is called an optimal
solution. In a highly competitive world where one percentage point can mean a
difference of hundreds of thousands of dollars or more, knowing the best solution
can mean the difference between success and failure. Prescriptive decision models
can be either deterministic or stochastic. A deterministic model is one in which all
model input information is either known or assumed to be known with certainty. A
stochastic model is one in which some of the model input information is uncertain.

7.0 Problem solving with Analytics


i) Recognizing a problem
ii) Defining the problem
iii) Structuring the problem
iv) Analysing the problem
v) Interpreting result and making a decision
vi) Implementing the solution

10
CHAPTER 2 ANALYTICS ON SPREADSHEETS

8.0 Basic Concepts and Procedures


i. Opening, saving, and printing files
ii. Using wordbooks and worksheets
iii. Moving around a spreadsheet
iv. Selecting cells and ranges
v. Inserting or deleting rows and columns
vi. Entering and editing text, numerical data and formulas in cells
vii. Formatting data (number, currency, decimal places etc.)
viii. Working with text strings
ix. Formatting data and text
x. Modifying the appearance of the spreadsheet using borders, shading and so on.

8.1 Example: Demand Prediction Model

Assuming that you have two models for predicting demand as a function of
prices:

11
9.0 Basic Excel Functions

12
9.1 Other ‘IF’ – type Functions

To embed “IF” logic within mathematical functions

9.2 Functions for Specific Applications


i. NPV Function

ii. IF Function

13
10.0 Excel Lookup Functions

Functions for finding specific data in a spreadsheet

11.0 Spreadsheet Add-Ins for Business Analytics

14
CHAPTER 3 DATA VISUALIZATION AND EXPLORATION

12.0 Introduction of Data Visualization

Converting data into information to understand past and current performance is the core
of descriptive analytic and is vital to making good business decisions. Technique for
doing this range from plotting data on charts, extracting data from database, and
manipulating and summarizing data. Data visualization is the process of displaying data
in a meaningful fashion to provide insights that will support better decision. Visualizing
data provides a way of communication data at all levels of a business and can reveal
surprising patterns and relationships.

Dashboard is a visual representation of a set of key business measures. It is derived


from the analogy of an automobile’s control panel, which display speed, gasoline level,
temperature and so on. Dashboard provide important summarize of key business
information to help manage a business process or function. dashboard might include
tabular as well as visual data to allow managers to quickly located key data.

13.0 Creating Charts Using Tableau


Excel is everywhere. It’s the go to analysis tool and spreadsheet software for many
business users. With tableau, it is even more powerful. With drag and drop approach to
visual analysis, tableau makes exploring excel data faster and easier. You can ask and
answer question as you go, instead of running separate reports or cross tabs for every
question. Tableau visualization are interactive and highly shareable, helping everyone
in your business get answer.
Tableau natively connect to excel spreadsheets to make data analysis fast and simple.
Tableau allows excel users to keep their spreadsheets, while greatly enhancing their
ability to analyze their data, all while simple to build and simple to read visualization
that convey information clearly.

13.1 Data Queries


A database query is either an action query or a select query. A select query is
one that relative data from a database. An action query asks for additional
operation data, such as insertion, updating, deleting or other forms of data
manipulation.

15
This doesn’t mean that users just type in random request. for a database to understand
demand, it must receive a query based on the predefined code. That code is a query
language

13.2 Statistic Method for Summarizing Data


A statistic is a summary measure of data:
i) Microsoft excel support statistical analysis in two ways
o With statistic functions that are entered in worksheet cells directly or
embedded in formulas
o With the excel analysis tools Pak add in to perform more complex
statistical computations.
ii) Descriptive statistics refers to methods of describing and summarizing data
using tabular, visual and quantitative techniques.
13.3 Frequency Distribution

A frequency distribution is a tablet that shows the number of observation in each


of several no overlapping groups. Categorical variables naturally define the
group in a frequency distribution. To construct a frequency distribution, we
need only count the number of observation that appear in each category. this
can be done using the excel count if function.

13.4 Relative Frequency Distributions

We may express the frequencies as a fraction or proportion of the total this is


call the relative frequency. If a data set has n observation, the relative frequency
of category I is computed as:

Relative frequency of category i = Frequency of category i


n

We often multiply the relative frequencies by 100 to express them as


percentages. A relative frequency distribution is a tabular summary of the
relative frequency of all categories.

16
13.5 Excel Histogram Tool

Frequency distribution and histogram can be created using the analysis tool Pak
in Excel. To do this m click the Data analysis tools button in the analysis group
under the data tab in the excel menu bar and select Histogram from the list. In
the dialog box, specify the input range corresponding to the data. If you include
the column header, then also check the label box so Excel knows that the range
contains a label. The Bin Range defines the groups used for the frequency
distribution. If you do not specify a Bin Range, Excel will automatically
determine bin values for the frequency distribution and histogram, which often
result in a rather poor choice. If you have discrete, values, set up a column of
these values is your spreadsheet for the bin range and specify this range in the
bin range field. Tick the chart output box to display a histogram in addition to
the frequency distribution. You may also sort the values as a Paratoo chart and
display the cumulative frequencies by checking the additional boxes.

13.6 Cumulative Relative Frequency Distributions

The cumulative relative frequency represents the proportion of the total number
of observations that fall at or below the upper limit pf each group. A tabular
summary of cumulative relative frequencies is called a cumulative frequency
distribution.

13.7 Percentiles

Percentiles specify the percent of other test takers who score at or below the
score of particular individual. The most common way compute the kth
percentile is to order the data value from smallest to largest and calculate the
rank of the kth percentile using the formula :

nk + 0.5
100
13.8 Quartiles

Quartiles break the data into four parts.

1) The 25th percentiles are called the first quartiles, Q1


2) The 50th percentile is called the second quartile, Q2
3) The 75th percentile is called the third quartile, Q3

17
4) The 100th percentile is called the fourth quartile, Q4

One fourth of the data fall below the first quartile, one half are below the second,
and three – fourths are below the third quartile. We may compute quartile using
the excel function QUARTILE. INC (array, quart), where array specifies the
range of the data and quart is a whole number between 1 and 4 , designation the
desired quartile.

13.9 Cross- Tabulation

A crosstab tabulation is a tabular method that display the number of observation


in a data set for different subcategories of two categorical variable

13.10 Using Pivot Tables

Pivot tables allows you to create summaries and chart of key information in the
data. Pivot tablet can be used to quickly create cross tabulations and to drill
down into a large set of data in numerous ways.

13.11 Pivot Charts

A pivot chart is the visual representation of a pivot tablet in Excel. Pivot chart
and pivot tables are connected with each other.

13.12 Slicers and Pivot Tablet Dashboard

Slicers is a tool for drilling down to “slice” a Pivot Table and display a subset
of data. To create a slicer for any of the column in the database, click on the
pivot table and choose insert slicer from the analyse tab in the pivot table tools
ribbon.

14.0 Conclusion of Data Visualization and Exploration

Good data visualization should communicate a data set clearly and effectively by using
graphic. The best visualization makes it easy to comprehend data a t a glace.

18
CHAPTER 4: DESCRIPTIVE STATISTICAL MEASURES

15.0 Introduction

Frequency distributions, histogram and cross- tabulations are tabular and visual tools
of descriptive statistics. In this chapter, we introduce numerical measures that provide
an effective and efficient way of obtaining meaningful information from data.

16.0 Population and Samples

Population is the entire set of items from which you draw data for a statistical study. It
can be a group of individuals, a set of items, etc. It makes up the data pool for a study.
Generally, population refers to the people who live in a particular area at a specific time.
But in statistics, population refers to data on your study of interest. It can be a group of
individuals, objects, events, organizations, etc. You use populations to draw
conclusions.

Figure 1: Population

An example of a population would be the entire student body at a school. It would


contain all the students who study in that school at the time of data collection.

19
Depending on the problem statement, data from each of these students is collected.
An example is the students who speak Hindi among the students of a school.

For the above situation, it is easy to collect data. The population is small and willing to
provide data and can be contacted. The data collected will be complete and reliable.

If you had to collect the same data from a larger population, say the entire country of
India, it would be impossible to draw reliable conclusions because of geographical and
accessibility constraints, not to mention time and resource constraints. A lot of data
would be missing or might be unreliable. Furthermore, due to accessibility issues,
marginalized tribes or villages might not provide data at all, making the data biased
towards certain regions or groups.

What is a Sample?

A sample is defined as a smaller and more manageable representation of a larger


group. A subset of a larger population that contains characteristics of that population. A
sample is used in statistical testing when the population size is too large for all members
or observations to be included in the test.

The sample is an unbiased subset of the population that best represents the whole data.

To overcome the restraints of a population, you can sometimes collect data from a
subset of your population and then consider it as the general norm. You collect the
subset information from the groups who have taken part in the study, making the data
reliable. The results obtained for different groups who took part in the study can be
extrapolated to generalize for the population.

Figure 2: Sample

20
The process of collecting data from a small subsection of the population and then using
it to generalize over the entire set is called Sampling.

Understanding Statistical Notations


Population Sample
Mean 𝜇 𝑥ҧ
Standard Deviation 𝜎 s
Pi 𝜋 p
Sum 𝑁 𝑛
෍ 𝑥𝑖 ෍ 𝑥𝑖
𝑖=1 𝑖=1
Population N n

17.0 Measure of Location

Measures of location summarize a list of numbers by a "typical" value. The three most
common measures of location are the mean, the median, and the mode. The mean is the
sum of the values, divided by the number of values. It has the smallest possible sum of
squared differences from members of the list

Terms Definition Excel Function


Arithmetic Mean Total of all observation divided by =AVERAGE(data range)
number of observation . Also
knows as mean or average
Median Middle value when the data is =MEDIAN(data range)
arranged frim lowest to highest
Mode Observation that occurs most =MODE.SNGL(data range)
frequently
Midrange The average of the greatest and =(MAX(data range) -
least values in the data set MIN(data range)/2
Measure of location provide estimate of a single value that represent the “cantering” of
set of data. E.g. to measure student accomplishment in college (CPA/CGPA), to
measure the performance of sports teams (club rank), and to measure performance in
business (average delivery time)

21
18.0 Measure of Dispersion

Dispersion is the state of getting dispersed or spread. Statistical dispersion means the
extent to which numerical data is likely to vary about an average value. In other words,
dispersion helps to understand the distribution of the data.

In statistics, the measures of dispersion help to interpret the variability of data i.e. to
know how much homogenous or heterogeneous the data is. In simple terms, it shows
how squeezed or scattered the variable is.

18.1 Types of Measures of Dispersion


i) Absolute Measure of Dispersion
ii) Relative Measure of Dispersion

18.1.1 Absolute Measure of Dispersion

An absolute measure of dispersion contains the same unit as the original


data set. The absolute dispersion method expresses the variations in
terms of the average of deviations of observations like standard or means
deviations. It includes range, standard deviation, quartile deviation, etc.

18.1.1.1 Types of absolute measures of dispersion are:

Range: It is simply the difference between the maximum value and


the minimum value given in a data set. Example: 1, 3,5, 6, 7 =>
Range = 7 -1= 6

Variance: Deduct the mean from each data in the set, square each
of them and add each square and finally divide them by the total
no of values in the data set to get the variance. Variance (σ2) =
∑(X−μ)2/N

Standard Deviation: The square root of the variance is known as


the standard deviation i.e. S.D. = √σ.

Quartiles and Quartile Deviation: The quartiles are values that


divide a list of numbers into quarters. The quartile deviation is half
of the distance between the third and the first quartile.

22
Mean and Mean Deviation: The average of numbers is known as the
mean and the arithmetic mean of the absolute deviations of the
observations from a measure of central tendency is known as the mean
deviation (also called mean absolute deviation).

19.0 Standardized Value

Commonly called a z-score, provides a relatives measure of the distance an observation


is from the mean. The z score is calculating as follows by dividing by
the standard deviation, s, we scale the distance from the mean to express it in unit of
standard deviations, Thus, a z-score of 1.0 means that the observation is one standard
deviation to the right of the mean; a z-score of -1.5 means that the observation is 1.5
standard deviations to the left of the mean. Even though two data sets may have
different means and standard deviations, the same z- score means that the observations
have the same relative stance from their respective means.

Coefficient of Variation

Provides a relative measure of the dispersion in data relative to the mean. CV = standard
deviation/mean. It can be calculating as a percentage. Provides a relative measure of
risk to return and the smaller the coefficient of variation the smaller the relative risk is
for the return provided.

20.0 Measure of Shape


Measures of shape describe the distribution (or pattern) of the data within a dataset. The
first thing you usually notice about a distribution’s shape is whether it has one mode
(peak) or more than one. If it’s unimodal (has just one peak), like most data sets, the
next thing you notice is whether it’s symmetric or skewed to one side. If the bulk of
the data is at the left and the right tail is longer, we say that the distribution is skewed
right or positively skewed; if the peak is toward the right and the left tail is longer, we
say that the distribution is skewed left or negatively skewed.

Look at the two graphs below. They both have μ = 0.6923 and σ = 0.1685, but their
shapes are different.

23
The beta distribution is one of the many skewed distributions that are used in
mathematical modelling.

Beta(α=4.5, β=2) 1.3846 − Beta(α=4.5, β=2)


skewness = −0.5370 skewness = +0.5370

The first one is moderately skewed left: the left tail is longer and most of the distribution
is at the right. By contrast, the second distribution is moderately skewed right: its right
tail is longer and most of the distribution is at the left.

You can get a general impression of skewness by drawing a histogram (MATH200A


part 1), but there are also some common numerical measures of skewness. Some
authors favour one, some favour another. This Web page presents one of them. In fact,
these are the same formulas that Excel uses in its “Descriptive Statistics” tool in
Analysis Toolbar, and in the SKEW() function.
You may remember that the mean and standard deviation have the same units as the
original data, and the variance has the square of those units. However, the skewness has
no units: it’s a pure number, like a z-score.

Kurtosis

The other common measure of shape is called the kurtosis. As skewness involves the
third moment of the distribution, kurtosis involves the fourth moment. The outliers in
a sample, therefore, have even more effect on the kurtosis than they do on the skewness
and in a symmetric distribution both tails increase the kurtosis, unlike skewness where
they offset each other.

24
You may remember that the mean and standard deviation have the same units as the
original data, and the variance has the square of those units. However, the kurtosis, like
skewness, has no units: it’s a pure number, like a z-score.

Traditionally, kurtosis has been explained in terms of the central peak. You’ll see
statements like this one: Higher values indicate a higher, sharper peak; lower values
indicate a lower, less distinct peak. Balanda and MacGillivray (1988) also mention the
tails: increasing kurtosis is associated with the “movement of probability mass from the
shoulders of a distribution into its centre and tails.”

However, Peter Westfall (2014) has been on a bit of a crusade to change this perception,
and I think he makes a good case. We might say, following Wikipedia’s article on
kurtosis (accessed 15 May 2016), that “higher kurtosis means more of the variance is
the result of infrequent extreme deviations, as opposed to frequent modestly sized
deviations.” In other words, it’s the tails that mostly account for kurtosis, not the central
peak.

The reference standard is a normal distribution, which has a kurtosis of 3. In token of


this, often the excess kurtosis is presented: excess kurtosis is simply kurtosis−3. For
example, the “kurtosis” reported by Excel is actually the excess kurtosis.

A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any
distribution with kurtosis ≈3 (excess ≈0) is called mesocratic.

A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to
a normal distribution, its tails are shorter and thinner, and often its central peak is lower
and broader.

A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to
a normal distribution, its tails are longer and fatter, and often its central peak is higher
and sharper.

Note that word “often” in describing changes in the central peak due to changes in the
tails. Westfall 2014 gives several illustrations of counterexamples.

25
21.0 Excel Descriptive Statistics Tool

• Data > Data Analysis > Descriptive Statistics

21.1 Descriptive Statistics for Grouped Data

• In some situations, data may already be grouped in a frequency distribution, and


we may not have access to the raw data.

• In these situations, we cannot compute the mean or variance using the standard
formulas.

frequency

21.2 Descriptive Statistics for Categorical Data

• Statistics such as means and variances are not appropriate for categorical data.

• Proportions or fractions, usually denoted by p, are key descriptive statistics for


categorical data.

• It is important to realize that proportions are numbers between 0 and 1.

21.3 Descriptive Statistics using PivotTables

• PivotTables have the functionality to calculate many basic statistical measures


from the data summaries.

• If you look at the Value Field Settings dialog, you can see that you can calculate
the average, standard deviation, and variance of a value field.

22.0 Measures of Association

• Two variables have a strong statistical relationship with one another if they
appear to move together. e.g. ice cream sales and hot weather.

• When two variables appear to be related, you might suspect a cause-and-effect


relationship.

26
• Sometimes, however, statistical relationships exist even though a change in one
variable is not caused by a change in the other.

22.1 Measures of Association: Covariance

• Covariance is a measure of the linear association between two variables, X and


Y.

• Excel function = COVARIANCE.P (array1, array2).

• Excel function = COVARIANCE.S (array1, array2).

22.2 Measures of Association: Correlation

• The numerical value of the covariance is generally difficult to interpret because


it depends on the units of measurement of the variables.

• Correlation is a measure of the linear relationship between two variables, X and


Y, which does not depend on the units of measurement.

• Correlation is measured by the correlation coefficient, also known as the


Pearson product moment correlation coefficient.

• Excel function = CORREL (array1, array2)

22.3 Excel Correlation Tool

• Data > Data Analysis > Correlation

23.0 Outliers

• mean and range are sensitive to outliers—unusually large or small values in the
data.

• Outliers can make a significant difference in statistical analyses results.

27
• The first thing to do from a practical perspective is to check the data for possible
errors, such as a misplaced decimal point or an incorrect transcription to a computer
file.

• Histograms can help to identify possible outliers visually.

• We might use the empirical rule and z-scores to identify an outlier as one that
is more than three standard deviations from the mean.

24.0 Statistical Thinking in Business Decisions

• Statistical thinking is a philosophy of learning and action for improvement that


is based on the principles that

• all work occurs in a system of interconnected processes;

• variation exists in all processes; and

• better performance results from understanding and reducing variation.

24.1 Variability in Samples

• Because we usually deal with sample data in business analytics applications, it


is extremely important to understand that different samples from any population will
vary; that is, they will have different means, standard deviations, and other statistical
measures and will have differences in the shapes of histograms.

• In particular, samples are extremely sensitive to the sample size—the number


of observations included in the samples.

25.0 Conclusion

Descriptive statistics provide summarizing information of the characteristics and


distribution of values in one or more datasets. The classical descriptive statistics allow
analysts to have a quick glance of the central tendency and the degree of dispersion of
values in datasets. They are useful in understanding a data distribution and in comparing

28
data distributions. For human geographers, it is often necessary to take into account the
locational references of the data we work with. The spatial descriptive statistics allow
analysts to assess the central tendency and variation of data in spatial context. The two
types of descriptive statistics are complementary. Combining both statistics, analysts
are able to study the geographic phenomena they work with.

While descriptive statistics are simple concepts in statistical analysis, they are important
and useful in today's era of big data. With increasing large volumes of data being
produced constantly and distributed via Internet, the effectiveness and usefulness of
descriptive statistics should not be overlooked.

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy