0% found this document useful (0 votes)
21 views58 pages

QM 1

Uploaded by

akashkm2710
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views58 pages

QM 1

Uploaded by

akashkm2710
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Quantitative Methods

Data is a valuable resource in today’s ever-changing


marketplace. For business professionals, knowing
how to interpret and communicate data is an
indispensable skill that can inform sound decision-
making.

This Photo by Unknown author is licensed under CC BY-SA.


Data and Classification of Data
Data are any number of related observations.

A single data or observation is known as data point. A collection of data


is a data set.
Data can be qualitative or quantitative

Quantitative data are numerical data that can be measured

Qualitative data are data for which the measurement scale is categorical
Classification
Data

Qualitative Quantitative

Discrete Continuous
Processing Data

Data-Initial Generate hypothesis


Theory (Literature)
Observation (RQ) (Variables)

Collect data to test


Analyze data (Fit a
predictions (Measure
model)
variables)
Making Sense from Data
• The growing amount of data and improved computational
capacity are altering how businesses collect data, process
information, and make choices.

• Analytics is a scientific approach that involves utilizing large


amounts of data to find models that inform judgements and
actions. Making Data visualization an integral part of Business
Intelligence.

• Applied in Operations, Marketing, Finance, and Strategic


Planning, among other functions

This Photo by Unknown author is licensed under CC BY-NC-ND.


Data and its analysis help in searching for
possible solutions and evaluates them

Data insights can systematically test the


Management robustness of solutions w.r.t. change in business
based on environment (what-if and sensitivity analysis).

data and Data help generates insights which leads to better


decision making.
models
Data analytical tools and repetitive use of them
harness naïve intuition into an instinct.
Implementation
• Improving Productivity and Collaboration at Microsoft
• At technology giant Microsoft, collaboration is key to a productive, innovative work
environment.
• Microsoft’s Workplace Analytics team hypothesized that moving the 1,200-person
group from five buildings to four could improve collaboration by increasing the number
of employees per building and reducing the distance that staff needed to travel for
meetings.
• In an article for the Harvard Business Review, the company’s analytics team shared the
outcomes they observed as a result of the relocation.
• Through looking at metadata attached to employee calendars, the team found that the
move resulted in a 46 percent decrease in meeting travel time. This translated into a
combined 100 hours saved per week across all relocated staff members and an
estimated savings of $520,000 per year in employee time.
HBR article :Matt Gavin
Targeting Consumers at PepsiCo
• To ensure the right quantities and types of products are available to consumers
in certain locations, PepsiCo uses big data analytics.
• PepsiCo created a cloud-based data and analytics platform called Pep Worx to
make more informed decisions regarding product merchandising.
• With Pep Worx, the company identifies shoppers in the United States who are
likely to be highly interested in a specific PepsiCo brand or product.
• For example, Pep Worx enabled PepsiCo to distinguish 24 million households
from its dataset of 110 million US households that would be most likely to be
interested in Quaker Overnight Oats.
• The company then identified specific retailers that these households might
shop at and targeted their unique audiences.
• Ultimately, these customers drove 80 percent of the product’s sales growth in
its first 12 months after launch.

HBR article :Matt Gavin


1854 Cholera Outbreak - Snow's Map
(Diagnostic Analysis)

He created a map depicting where cases of cholera occurred in London’s West End and found them to be clustered
around a water pump on Broad Street.
Analytics

Source: Prof. David Simchi Levi


Descriptive Statistics
• Descriptive analytics examines what happened in the past. We
are utilizing descriptive analytics when we examine past data
sets for patterns and trends.
• Descriptive analytics functions by identifying what metrics you
want to measure, collecting that data, and analyzing it.
• It turns the stream of facts the business has collected into
information we can act on, plan around, and measure.
• Examples of descriptive analytics include:
• Annual revenue reports
• Year-over-year sales reports
Decision making approach

• What is the best estimate of population of Sri Lanka?

A. 50 million
B. 52 million
C. 22 million
D. 49 million
Scale of
Measurement
Likely to encounter these terms:
▪ Data are the facts and figures that are collected, summarized, analyzed,
and interpreted
▪ Elements are the entities on which data are collected
▪ A variable is a characteristic of interest for the elements
▪ A data set with n elements contains n observations
▪ Predictor variable: A variable thought to predict an outcome variable. This
term is basically another way of saying ‘independent variable or cause’.
▪ Outcome variable: A variable thought to change as a function of changes in
a predictor variable.(dependent variable or effect)
▪ Variables are measured constructs that vary across entities in the sample.
▪ In contrast, parameters are not measured and are (usually) constants
believed to represent some fundamental truth about the relations
between variables in the model. (mean, median and correlation,
regression)
For Instance

Name of Element

Variables
Variables

Name of Element

For Instance
Types of Measurement scale
• Variables can be split into categorical and continuous, and within these types
there are different levels of measurement:
• Categorical (entities are divided into distinct categories):
• Binary variable: There are only two categories (e.g., dead or alive).
• Nominal variable: There are more than two categories (e.g., whether someone is an
omnivore, vegetarian, vegan, or fruitarian).
• Ordinal variable: The same as a nominal variable but the categories have a logical
order (e.g., whether people got a fail, a pass, a merit or a distinction in their exam)
• Continuous or Quantitative (entities get a distinct score):
• Interval variable: Equal intervals on the variable represent equal differences in the
property being measured (e.g., the difference between 6 and 8 is equivalent to the
difference between 13 and 15).
• Ratio variable: The same as an interval variable, but the ratios of scores on the scale
must also make sense (e.g., a score of 16 on an anxiety scale means that the person is,
in reality, twice as anxious as someone scoring 8). For this to be true, the scale must
have a meaningful zero point.
What is the level of measurement of the following variables?

• The number of downloads of different bands’ songs on iTunes

• The phone numbers that the bands obtained during registration

• The gender of the people giving the bands their phone numbers

• The instruments played by the band members

• The time they had spent learning to play their instruments


Analysing data
• We collect data from a smaller subset of the population known as a sample and
use these data to infer things about the population as a whole. The bigger the
sample, the more likely it is to reflect the whole population
• The final stage of the research process is to analyse the data you have collected.
• The statistical analysis appropriate for a particular variable depends upon
whether the variable is categorical or quantitative.
• We can summarize categorical data by counting the number of observations in
each category or by computing the proportion of the observations in each
category.
• When the data are quantitative this involves both looking at your data graphically
to see what the general trends in the data are, and also fitting statistical models
to the data.
Caselet: How much students expect to make?
• Ashnaa (hypothetical name), an aspiring MBA applicant, was particularly
interested in starting salaries of graduates. She found a dataset from a
prominent MBA school in Germany.
• The data was of the class of 2023 who was surveyed about their satisfaction
with the MBA program and their starting salaries. The survey responses were
linked to existing data on the graduates, including age, sex, work experience,
GMAT scores, MBA averages, quartile ranking, and native language.
• Ashnaa was pleased to find this data and hoped it could answer her key
questions about starting salaries, the impact of gender and age, student
satisfaction, and the influence of GMAT scores. Since her native language was
not English, she had a relatively low GMAT score.
• Field Description
• age age - in years
• sex 1=Male; 2=Female
• gmat_tot total GMAT score
• gmat_qpc quantitative GMAT percentile
• gmat_vpc verbal GMAT percentile
• qmat_tpc overall GMAT percentile
• s_avg spring MBA average
• f_avg fall MBA average
• quarter quartile ranking (1st is top, 4th is bottom)
• work_yrs years of work experience
• frstlang first language (1=English; 2=other)
• salary starting salary
• satis degree of satisfaction with MBA program (1= low, 7 = high satisfaction)
• ..\..\Downloads\QM1 case desc.xlsx
Basic ways to look at and summarize the data
you have collected.
• Frequency distributions : a graph of how many
times each score occurs.
• Frequency distributions can be very useful for assessing
properties of the distribution of scores. Frequency
distributions come in many different shapes and sizes.
• In an ideal world our data would be distributed
symmetrically around the centre of all scores. This is
known as a normal distribution and is characterized by
the bellshaped curve with which you might already be
familiar.
• we often use a normal distribution with a mean of 0 and
a standard deviation of 1 as a standard.
• There are two main ways in which a distribution can deviate from
normal: (1) lack of symmetry (called skew) and (2) pointyness (called
kurtosis). (In a normal distribution the values of skew and kurtosis are
0)
Statistics of rolling dice

https://academo.org/demos/dice-roll-
statistics/#:~:text=If%20you%20roll%20a
%20fair,%22roll%20automatically%22%2
0button%20above.

https://www.youtube.com/wat
ch?v=zeJD6dqJ5lo
Descriptive Statistics
▪ Numerical Measures
▪ Measures of Location
▪ Measures of Variability
Measures of Location
▪Mean
▪Median
▪Mode
▪Percentiles
▪Quartiles
The measure of central tendency
• We can calculate where the centre of a frequency distribution lies
using three measures commonly used: the mean, the mode and the
median.
• The mean is the sum of all scores divided by the number of scores.
The value of the mean can be influenced quite heavily by extreme
scores. (The mean provides a measure of central location)
• The median is the middle score when the scores are placed in
ascending order. It is not as influenced by extreme scores as the
mean.
• The mode is the score that occurs most frequently.
Business Scenario: Mean
• Suppose you want to run a campaign to advertise the racing bikes /
latest fashion trend at a location.

• The average age of people living in that area is 39 Years.

• Will you run the event?


Age
Business Scenario: Mean 25
22
23
• The intuitive answer is No! Simply because of the ‘average’
22
age of the people living in that area is 39 Years and it will not
19
make sense to sell them these items as products.
24
26
• After canceling the event you looked at the age data closely and 22
found something like this… 21
20
20
• Most of the population is young! But there are 2 cases which are
121
abruptly different from all the other ages. Most probably this is a
24
data error.
180
16
Limitations of using Mean
• Mean is affected by outliers ( extreme values)
• And this is exactly why you cannot trust mean for approximating the average
trend for anything and one must always doubt statements like below.
• ‘Average placement salary of students from our institute is 15Lac per annum’
• ‘The mileage of our bike is 60 Kmpl’
• Outliers could be taken into account in these examples to inflate the figures
to catch attention and influence decision making.
• Tip! : This can introduce bias in the data since the mean value does not
always represent the central tendency or the general pattern of data. An
alternate way is to use the median value, which is a better indicator of
central tendency.
Practical use- Sales Data: Use the mean for overall average sales but consider the median to understand typical sales
figures if there are outliers.
How and where to use median
• Recall the previous example

• Whenever a data set has extreme values, the median is the preferred
measure of central location. The median of a data set is the value in
the middle when the data items are arranged in ascending order.

• As a general rule, use median when you want to get the average of a
vector that includes a more uneven data set.
• odd number of scores
Median
• even number of scores
The mean annual loan amount of the population is Customer Loan
13,50,000 INR. But this amount is higher than that Amount (in
earned by 80% of the population. Rs)
1 8,00,000

The median amount is 9,00,000 INR. 50% of the


population has lower loan than this amount, and 50% 2 8,50,000

has higher loan. So, the median represents the


“average” concept in a better way. 3 9,00,000

4 9,50,000
However, the median is not an impeccable statistic.
There are several things that we should consider when
5 32,50,000
using it for communicating statistical information.

Practical use- Salary Analysis: Use the median to report typical salaries when there are a few extremely high or low
salaries that could skew the mean.
Limitations of using Median

Median does not convey the information


of min and max values

Median may lead to a false impression

Median is not good for planning


• The mode is the score that occurs most frequently in the
data set.
The mode • This is easy to spot in a frequency distribution because it
will be the tallest bar.

Practical use- Customer Preferences: Use the mode to identify the most preferred product or service feature.
Dispersion of distribution

This Photo by Unknown author is licensed under CC BY-SA-NC.


The dispersion in a distribution
• The deviance or error is the distance of each score from the mean.
• The sum of squared errors is the total amount of error in the mean. The
errors/deviances are squared before adding them up.
• The variance is the average distance of scores from the mean. It is the sum
of squares divided by the number of scores. It tells us about how widely
dispersed scores are around the mean.
• The standard deviation is the square root of the variance. It is the variance
converted back to the original units of measurement of the scores used to
compute it. Large standard deviations relative to the mean suggest data are
widely spread around the mean, whereas small standard deviations
suggest data are closely packed around the mean.
• The range is the distance between the highest and lowest score.
• The interquartile range is the range of the middle 50% of the scores.
• Outcomei = b0+errori
• Given that our model is defined by parameters, this amounts to saying
that we’re not interested in the parameter values in our sample, but we
care about the parameter values in the population.
• We can use the sample data to estimate what the population parameter
values are likely to be. That’s why we use the word ‘estimate’, because
when we calculate parameters based on sample data they are only
estimates of what the true parameter value is in the population.
The mean as a statistical model
▪ For example, if we took five of you and measured the number of
friends that you have, we might find the following data: 1, 2, 3, 3 and
4.
▪ Mean number of values (1 + 2 + 3 + 3 + 4)/5 = 2.6.
▪ 2.6 friend ?
▪ So the mean value is a hypothetical value: it is a model created to
summarize the data and there will be error in prediction.
▪ Outcomei = b0+errori
▪ Where b0 is the mean of outcome
• The important thing is that we can use the value of the mean (or any
parameter) computed in our sample to estimate the value in the
population (which is the value in which we’re interested)
• So like for first person has 1 friend:
• 1=2.6+error
• error=1-2.6=-1.6
• You might notice that all we have done here is calculate the deviance.
The deviance is another word for error. A more general way to think
of the deviance or error is by rearranging equation into:
• Deviance = outcomei-modeli
• We know the accuracy or ‘fit’ of the model for a particular person having
1 friend, but we want to know the fit of the model overall.
• we can’t add deviances because some errors are positive and others
negative and so we’d get a total of zero.
• One way around this problem is to square the errors.

• SS=5.20
• This equation shows how something we have used before (the sum of
squares) can be used to assess the total error in any model (not just the
mean).
• Although, the sum of squared errors (SS) is a good measure of the
accuracy of our model, it depends upon the quantity of data that has
been collected – the more data points, the higher the SS.
• To estimate the mean error in the population we need to divide not by the
number of scores contributing to the total, but by the degrees of freedom
(df), which is the number of scores used to compute the total adjusted for
the fact that we’re trying to estimate the population value

• Our model is the mean, so let’s replace the ‘model’ with the mean ( ), and
the ‘outcome’ with the letter x (to represent a score on the outcome).

• The mean squared error is also known as the variance.


• Both measures give us an idea of how well a model fits the data: large values
relative to the model indicate a lack of fit.
• The standard deviation is the square root of the variance

• The sum of squares, variance and standard deviation are all


measures of the dispersion or spread of data around the
mean.
• A small standard deviation (relative to the value of the mean
itself) indicates that the data points are close to the mean.
• A large standard deviation (relative to the mean) indicates
that the data points are distant from the mean.
• A standard deviation of 0 would mean ?
68–95–99.7 rule
• In statistics, the 68–95–99.7
rule, also known as
the empirical rule, in
a normal distribution:
approximately 68%, 95%,
and 99.7% of the values lie
within one, two, and
three standard deviations of
the mean, respectively.
For example suppose the height of students in a
college has mean of 5.5 ft and std of 0.5 ft.
Ques: The normal distribution has stdv of 10 and
mean 70. Approx what area is contained between 70
and 90?
Ques: For normal distribution, with mean of 0 and
stdev 1, what area is contained between -2 and 1?
Quartile
• Quartiles are specific percentiles.
• First Quartile = 25th Percentile
• Second Quartile = 50th Percentile = Median
• Third Quartile = 75th Percentile
• To calculate the range but excluding values at the
extremes of the distribution. One convention is
Spread of Data Scores to cut off the top and bottom 25% of scores and
calculate the range of the middle 50% of scores –
known as the interquartile range.
Box plot
• A box plot is a graphical summary of data that is based on a five-number
summary.
• A key to the development of a box plot is the computation of the median and
the quartiles Q1 and Q3
• Box plots provide another way to identify outliers

This Photo by Unknown author is licensed under CC BY-SA-NC.


• Compare their graphs: the ratings for lecturer 1 were consistently close to
the mean rating, indicating that the mean is a good representation of the
observed data – it is a good fit. The ratings for lecturer 2, however, were
more spread out from the mean: for some lectures, she received very high
ratings, and for others her ratings were terrible. Therefore, the mean is not
such a good representation of the observed scores – it is a poor fit.
EXAMPLE

This Photo by Unknown author is licensed under CC BY-NC.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy